Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages

Alexander Gutkin

Proc. of Interspeech 2017, International Speech Communication Association (ISCA), August 20--24, Stockholm, Sweden, pp. 2183-2187

Download Google Scholar

Abstract

Acquiring data for text-to-speech (TTS) systems is expensive. This typically requires large amounts of training data, which is not available for low-resourced languages. Sometimes small amounts of data can be collected, while often - no data may be available at all. This paper presents acoustic modeling approach utilizing long short-term memory (LSTM) recurrent neural network (RNN) aimed at partially addressing the language data scarcity problem. Unlike speaker-adaption systems that aim to preserve speaker similarity across languages, the salient feature of the proposed approach is that, once constructed, the resulting system does not need retraining to cope with the previously unseen languages. This is due to language and speaker-agnostic model topology and universal linguistic feature set. Experiments on twelve languages show that the system is able to produce intelligible and sometimes natural output when language is unseen. We also show that, when small amounts of training data are available, pooling the data sometimes improves the overall intelligibility and naturalness. Finally, we show that sometimes having a multilingual system with no prior exposure to the language is better than building single-speaker system from small amounts of data for that language.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities