Olivier Siohan

Speech Processing

Research Areas

Speech Processing

Authored Publications

Google Publications

Other Publications

Large Scale Self-Supervised Pretraining for Active Speaker Detection

Alice Chuang

Keith Johnson

Olivier Siohan

Otavio Braga

Tony (Tuấn) Nguyễn

Wei Xia

Yunfan Ye

ICASSP 2024 (2024) (to appear)

Revisiting the Entropy Semiring for Neural Speech Recognition

Oscar Chang

Dongseong Hwang

Olivier Siohan

International Conference on Learning Representations (2023) (to appear)

On Robustness to Missing Video For Audiovisual Speech Recognition

Oscar Chang

Otavio de Pinho Forin Braga

Hank Liao

Dmitriy (Dima) Serdyuk

Olivier Siohan

Transactions on Machine Learning Research (2022)

Best of both worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

Olivier Siohan

Otavio de Pinho Forin Braga

ICASSP 2022 (2022)

Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video

Dmitriy (Dima) Serdyuk

Olivier Siohan

Otavio de Pinho Forin Braga

Interspeech (2022) (to appear)

Audio-Visual Speech Recognition is Worth 32x32x8 Voxels

Dmitriy (Dima) Serdyuk

Olivier Siohan

Otavio de Pinho Forin Braga

ASRU (2021)

End-to-end audio-visual speech recognition for overlapping speech

Anshuman Tripathi

Olivier Siohan

Otavio de Pinho Forin Braga

Richard Rose

INTERSPEECH 2021: Conference of the International Speech Communication Association

Bridging the gap between streaming and non-streaming automatic speechrecognition systems through distillation of an ensemble of models

Chung-Cheng Chiu

Liangliang Cao

Olivier Siohan

Ruoming Pang

Thibault Doutre

Wei Han

Interspeech'2021

A CLOSER LOOK AT AUDIO-VISUAL MULTI-PERSON SPEECH RECOGNITION AND ACTIVE SPEAKER SELECTION

Olivier Siohan

Otavio de Pinho Forin Braga

ICASSP 2021 (2021)

END-TO-END MULTI-PERSON AUDIO/VISUAL AUTOMATIC SPEECH RECOGNITION

Hank Liao

Olivier Siohan

Otavio de Pinho Forin Braga

Takaki Makino

ICASSP 2020 (2020)

RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION

Basi Garcia

Brendan Shillingford

Hank Liao

Olivier Siohan

Otavio de Pinho Forin Braga

Takaki Makino

Yannis Assael

Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (2019)

Acoustic Modeling for Google Home

Bo Li

Tara Sainath

Arun Narayanan

Joe Caroselli

Michiel Bacchiani

Ananya Misra

Izhak Shafran

Hasim Sak

Golan Pundak

Kean Chin

Khe Chai Sim

Ron J. Weiss

Kevin Wilson

Ehsan Variani

Chanwoo Kim

Olivier Siohan

Mitchel Weintraub

Erik McDermott

Rick Rose

Matt Shannon

INTERSPEECH 2017 (2017)

Automatic Optimization of Data Perturbation Distributions for Multi-Style Training in Speech Recognition

Mortaza Doulaty

Richard Rose

Olivier Siohan

Proceedings of the IEEE 2016 Workshop on Spoken Language Technology (SLT2016)

Selection and Combination of Hypotheses for Dialectal Speech Recognition

Victor Soto

Olivier Siohan

Mohamed Elfeky

Pedro J. Moreno

ICASSP 2016

Multitask learning and system combination for automatic speech recognition

Olivier Siohan

David Rybach

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Preview

Large Vocabulary Automatic Speech Recognition for Children

Hank Liao

Golan Pundak

Olivier Siohan

Melissa Carroll

Noah Coccaro

Qi-Ming Jiang

Tara N. Sainath

Andrew Senior

Françoise Beaufays

Michiel Bacchiani

Interspeech (2015)

Training Data Selection Based On Context-Dependent State Matching

Olivier Siohan

Proceedings of ICASSP 2014

A big data approach to acoustic model training corpus selection

Olga Kapralova

John Alex

Eugene Weinstein

Pedro Moreno

Olivier Siohan

Conference of the International Speech Communication Association (Interspeech) (2014)

iVector-based Acoustic Data Selection

Olivier Siohan

Michiel Bacchiani

Proceedings of Interspeech (2013)

Preview

Decision Tree State Clustering with Word and Syllable Features

Hank Liao

Chris Alberti

Michiel Bacchiani

Olivier Siohan

Interspeech, ISCA (2010), 2958 – 2961

An Audio Indexing System for Election Video Material

Christopher Alberti

Michiel Bacchiani

Ari Bezman

Ciprian Chelba

Anastassia Drofa

Hank Liao

Pedro Moreno

Ted Power

Arnaud Sahuguet

Maria Shugrina

Olivier Siohan

Proceedings of ICASSP (2009), pp. 4873-4876

The IBM 2007 speech transcription system for European parliamentary speeches

Bhuvana Ramabhadran

Olivier Siohan

Abhinav Sethy

ASRU (2007), pp. 472-477

Vocabulary independent spoken term detection

Jonathan Mamou

Bhuvana Ramabhadran

Olivier Siohan

SIGIR (2007), pp. 615-622

Comments on Vocal Tract Length Normalization Equals Linear Transformation in Cepstral Space

Mohamed Afify

Olivier Siohan

IEEE Transactions on Audio, Speech & Language Processing, vol. 15 (2007), pp. 1731-1732

The IBM 2006 speech transcription system for european parliamentary speeches

Bhuvana Ramabhadran

Olivier Siohan

Lidia Mangu

Geoffrey Zweig

Martin Westphal

Henrik Schulz

Alvaro Soneiro

INTERSPEECH (2006)

The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings

Jing Huang

Martin Westphal

Stanley F. Chen

Olivier Siohan

Daniel Povey

Vit Libal

Alvaro Soneiro

Henrik Schulz

Thomas Ross

Gerasimos Potamianos

MLMI (2006), pp. 432-443

Automated Quality Monitoring for Call Centers using Speech and NLP Technologies

Geoffrey Zweig

Olivier Siohan

George Saon

Bhuvana Ramabhadran

Daniel Povey

Lidia Mangu

Brian Kingsbury

HLT-NAACL (2006)

Fast vocabulary-independent audio search using path-based graph indexing

Olivier Siohan

Michiel Bacchiani

INTERSPEECH (2005), pp. 53-56

A new verification-based fast-match for large vocabulary continuous speech recognition

Mohamed Afify

Feng Liu

Hui Jiang 0001

Olivier Siohan

IEEE Transactions on Speech and Audio Processing, vol. 13 (2005), pp. 546-553

Use of metadata to improve recognition of spontaneous speech and named entities

Bhuvana Ramabhadran

Olivier Siohan

Geoffrey Zweig

INTERSPEECH (2004)

Sequential estimation with optimal forgetting for robust speech recognition

Mohamed Afify

Olivier Siohan

IEEE Transactions on Speech and Audio Processing, vol. 12 (2004), pp. 19-26

Speech recognition error analysis on the English MALACH corpus

Olivier Siohan

Bhuvana Ramabhadran

Geoffrey Zweig

INTERSPEECH (2004)

Hierarchical class n-gram language models: towards better estimation of unseen events in speech recognition

Imed Zitouni

Olivier Siohan

Chin-Hui Lee

INTERSPEECH (2003)

Advances in natural language call routing

Hong-Kwang Jeff Kuo

Olivier Siohan

Joseph P. Olive

Bell Labs Technical Journal, vol. 7 (2003), pp. 155-170

Backoff hierarchical class n-gram language modelling for automatic speech recognition systems

Imed Zitouni

Olivier Siohan

Hong-Kwang Jeff Kuo

Chin-Hui Lee

INTERSPEECH (2002)

A discriminative training criterion and an associated EM learning algorithm

Mohamed Afify

Olivier Siohan

ICASSP (2002), pp. 1065-1068

Bell labs approach to Aurora evaluation on connected digit recognition

Jingdong Chen

Dimitris Dimitriadis

Hui Jiang 0001

Qi Li

Tor André Myrvoll

Olivier Siohan

Frank K. Soong

INTERSPEECH (2002)

Structural maximum a posteriori linear regression for fast HMM adaptation

Olivier Siohan

Tor André Myrvoll

Chin-Hui Lee

Computer Speech & Language, vol. 16 (2002), pp. 5-24

A dynamic in-search discriminative training approach for large vocabulary speech recognition

Hui Jiang 0001

Olivier Siohan

Frank K. Soong

Chin-Hui Lee

ICASSP (2002), pp. 113-116

Towards knowledge-based features for HMM based large vocabulary automatic speech recognition

Benoit Launay

Olivier Siohan

Arun C. Surendran

Chin-Hui Lee

ICASSP (2002), pp. 817-820

Upper and lower bounds on the mean of noisy speech: application to minimax classification

Mohamed Afify

Olivier Siohan

Chin-Hui Lee

IEEE Transactions on Speech and Audio Processing, vol. 10 (2002), pp. 79-88

Minimax classification with parametric neighborhoods for noisy speech recognition

Mohamed Afify

Olivier Siohan

Chin-Hui Lee

INTERSPEECH (2001), pp. 2355-2358

Joint maximum a posteriori adaptation of transformation and HMM parameters

Olivier Siohan

Cristina Chesta

Chin-Hui Lee

IEEE Transactions on Speech and Audio Processing, vol. 9 (2001), pp. 417-428

A new verification-based fast match approach to large vocabulary speech recognition

Feng Liu

Mohamed Afify

Hui Jiang 0001

Olivier Siohan

INTERSPEECH (2001), pp. 851-854

An auditory system-based feature for robust speech recognition

Qi Li

Frank K. Soong

Olivier Siohan

INTERSPEECH (2001), pp. 619-622

A real-time Japanese broadcast news closed-captioning system

Olivier Siohan

Akio Ando

Mohamed Afify

Hui Jiang 0001

Chin-Hui Lee

Qi Li

Feng Liu

Kazuo Onoe

Frank K. Soong

Qiru Zhou

INTERSPEECH (2001), pp. 495-498

Evaluating the Aurora connected digit recognition task - a bell labs approach

Mohamed Afify

Hui Jiang 0001

Filipp Korkmazskiy

Chin-Hui Lee

Qi Li

Olivier Siohan

Frank K. Soong

Arun C. Surendran

INTERSPEECH (2001), pp. 633-636

Small group speaker identification with common password phrases

Aaron E. Rosenberg

Olivier Siohan

S. Parthasarathy

Speech Communication, vol. 31 (2000), pp. 131-140

Constrained maximum likelihood linear regression for speaker adaptation

Mohamed Afify

Olivier Siohan

INTERSPEECH (2000), pp. 861-864

Extended maximum a posterior linear regression (EMAPLR) model adaptation for speech recognition

Wu Chou

Olivier Siohan

Tor André Myrvoll

Chin-Hui Lee

INTERSPEECH (2000), pp. 616-619

Structural maximum a-posteriori linear regression for unsupervised speaker adaptation

Tor André Myrvoll

Olivier Siohan

Chin-Hui Lee

Wu Chou

INTERSPEECH (2000), pp. 540-543

A high-performance auditory feature for robust speech recognition

Qi Li

Frank K. Soong

Olivier Siohan

INTERSPEECH (2000), pp. 51-54

Maximum a posteriori linear regression for hidden Markov model adaptation

Cristina Chesta

Olivier Siohan

Chin-Hui Lee

EUROSPEECH (1999)

Comparative experiments of several adaptation approaches to noisy speech recognition using stochastic trajectory models

Olivier Siohan

Yifan Gong

Jean Paul Haton

Speech Communication, vol. 18 (1996), pp. 335-352

Noise adaptation using linear regression for continuous noisy speech recognition

Olivier Siohan

Yifan Gong

Jean Paul Haton

EUROSPEECH (1995)

A comparison of three noisy speech recognition approaches

Olivier Siohan

Yifan Gong

Jean Paul Haton

ICSLP (1994)

A Bayesian approach to phone duration adaptation for lombard speech recognition

Olivier Siohan

Yifan Gong

Jean Paul Haton

EUROSPEECH (1993)

Minimization of speech alignment error by iterative transformation for speaker adaptation

Yifan Gong

Olivier Siohan

Jean Paul Haton

ICSLP (1992)

Search on Google Scholar

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Olivier Siohan

Research Areas

Join us

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Olivier Siohan

Research Areas

Filter by:

Year

Team

Research Area

Join us

AI/ML Foundations  & Capabilities