Lower Frame Rate Neural Network Acoustic Models

TLDR

On a large vocabulary Voice Search task, it is shown that with conventional models, one can slow the frame rate to 40ms while improving WER by 3% relative over a CTC-based model, thus improving overall system speed.