Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech

Jaka Aris Eko Wibawa

Supheakmungkol Sarin

Chen Fang Li

Knot Pipatsrisawat

Keshan Sodimana

Oddur Kjartansson

Alexander Gutkin

Martin Jansche

Linne Ha

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA), 7-12 May 2018, Miyazaki, Japan, pp. 1610-1614

Download Google Scholar

Abstract

We present the multi-speaker text-to-speech corpora for Javanese and Sundanese languages, the second and third biggest languages of Indonesia spoken by well over a hundred million people. The key objectives were to collect the high-quality data an affordable way and to share the data publicly with the speech community. To achieve this, we collaborated with two local universities in Java and streamlined our recording and crowdsourcing processes to produce the corpora consisting of 5.8 thousand (Javanese) and 4.2 thousand (Sundanese) mixed-gender recordings. We used these corpora to build several configurations of multi-speaker neural network-based text-to-speech systems for Javanese and Sundanese. Subjective evaluations performed on these configurations demonstrate that multilingual configurations for which Javanese and Sundanese are trained jointly with a larger Indonesian corpus significantly outperform the systems constructed from a single language. We hope that sharing these corpora publicly and presenting our multilingual approach to text-to-speech will help the community to scale up the text-to-speech technologies to other lesser resourced languages of Indonesia.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities