AI

On Lattice Generation for Large Vocabulary Speech Recognition

Abstract

Lattice generation is an essential feature of the decoder for many speech recognition applications. In this paper, we first review lattice generation methods for WFST-based decoding and describe in a uniform formalism two established approaches for state-of-the-art speech recognition systems: the phone pair and the N-best histories approaches. We then present a novel optimization method, pruned determinization followed by minimization, that produces a deterministic minimal lattice that retains all paths within specified weight and lattice size thresholds. Experimentally, we show that before optimization, the phone-pair and the N-best histories approaches each have conditions where they perform better when evaluated on video transcription and mixed voice search and dictation tasks. However, once this lattice optimization procedure is applied, the phone pair approach has the lowest oracle WER for a given lattice density by a significant margin. We further show that the pruned determinization presented here is efficient to use during decoding unlike classical weighted determinization from which it is derived. Finally, we consider on-the-fly lattice rescoring in which the lattice generation and combination with the secondary LM are done in one step. We compare the phone pair and N-best histories approaches for this scenario and find the former superior in our experiments.