Jump to Content

Xavier Velez

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Improving Automatic Speech Recognition with Neural Embeddings
    Christopher Li
    2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, 111 8th Ave New York, NY 10011 (2021)
    Preview abstract A common challenge in automatic speech recognition (ASR) systems is successfully decoding utterances containing long tail entities. Examples of entities include unique contact names and local restaurant names that may be out of vocabulary, and therefore absent from the training set. As a result, during decoding, such entities are assigned low likelihoods by the model and are unlikely to be recognized. In this paper, we apply retrieval in an embedding space to recover such entities. In the aforementioned embedding space, embedding representations of phonetically similar entities are designed to be close to one another in cosine distance. We describe the neural networks and the infrastructure to produce such embeddings. We also demonstrate that using neural embeddings improves ASR quality by achieving an over 50% reduction in word error rate (WER) on evaluation sets for popular media queries. View details
    Towards Acoustic Model Unification Across Dialects
    Meysam Bastani
    Mohamed G. Elfeky
    Pedro Moreno
    2016 IEEE Workshop on Spoken Language Technology
    Preview abstract Research has shown that acoustic model performance typically decreases when evaluated on a dialectal variation of the same language that was not used during training. Similarly, models simultaneously trained on a group of dialects tend to under-perform when compared to dialect-specific models. In this paper, we report on our efforts towards building a unified acoustic model that can serve a multi-dialectal language. Two techniques are presented: Distillation and MTL. In Distillation, we use an ensemble of dialect-specific acoustic models and distill its knowledge in a single model. In MTL, we utilize MultiTask Learning to train a unified acoustic model that learns to distinguish dialects as a side task. We show that both techniques are superior to the naive model that is trained on all dialectal data, reducing word error rates by 4.2% and 0.6%, respectively. And, while achieving this improvement, neither technique degrades the performance of the dialect-specific models by more than 3.4%. View details
    No Results Found