Multi-Accent Speech Recognition with Hierarchical Grapheme Based Models


We explore the viability of grapheme-based recognition specifically how it compares to phoneme-based equivalents. We utilize the CTC loss to train models to directly predict graphemes, we also train models with hierarchical CTC and show that they improve on previous CTC models. We also explore how the grapheme and phoneme models scale with large data sets, we consider a single acoustic training data set where we combine various dialects of English from US, UK, India and Australia. We show that by training a single grapheme-based model on this multi-dialect data set we create a accent-robust ASR system