TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING

被引:0
|
作者
Renovalles, Edsel Jedd [1 ]
Lucas, Crisron Rudolf [1 ]
de Leon, Franz [1 ]
Aquino, Angelina [1 ]
Jalandoni, Izza [1 ]
机构
[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines
来源
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2021年
关键词
data augmentation; deep learning; text-to-speech; unit selection; voice conversion;
D O I
10.1109/O-COCOSDA202152914.2021.9660431
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
There are several text-to-speech (TTS) systems developed and published for Philippine languages, particularly in Filipino and Cebuano. However, there is still a need to improve the performance of existing systems. Due to the limited amount of linguistic resources and lack of speech data available for Philippine languages, developing a reliable TTS system to support these languages becomes difficult. In this paper, we implement and evaluate the performance of two TTS systems for Filipino. We implemented two methods: unit selection using MaryTTS and deep learning approach using Tacotron-2. We tried applying modification on the F0 contour and duration of the unit selection system and used voice conversion to augment the training data of Tacotron-2. The unit selection system achieved a mean opinion score (MOS) of 3.05, however, the boundary-based F0 modification yielded perceivable distortions in the output and requires more enhancements to become more effective. On the other hand, the use of voice conversion to transform the original multi-speaker data into single-speaker data and producing more samples for training boosted the Tacotron-2 performance from an overall MOS of 1.51 to 2.01.
引用
收藏
页码:212 / 217
页数:6
相关论文
共 50 条
  • [31] Speaker-specific retraining for enhanced compression of unit selection text-to-speech databases
    Nurminen, Jani
    Silen, Hanna
    Gabbouj, Moncef
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 388 - 391
  • [32] RECENT IMPROVEMENTS OF PROBABILITY BASED PROSODY MODELS FOR UNIT SELECTION IN CONCATENATIVE TEXT-TO-SPEECH
    Zhang, Wei
    Gu, Liang
    Gao, Yuqing
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3777 - 3780
  • [33] Development of Assamese Text-to-speech System using Deep Neural Network
    Deka, Abhash
    Sarmah, Priyankoo
    Samudravijaya, K.
    Prasanna, S. R. M.
    2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
  • [34] MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language
    Mishev, Kostadin
    Karovska Ristovska, Aleksandra
    Trajanov, Dimitar
    Eftimov, Tome
    Simjanoska, Monika
    APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 14
  • [35] Planning the development of text-to-speech synthesis models and datasets with dynamic deep learning
    Ahmad, Hawraz A.
    Rashid, Tarik A.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)
  • [36] Integrating Articulatory Information in Deep Learning-based Text-to-Speech Synthesis
    Cao, Beiming
    Kim, Myungjong
    van Santen, Jan
    Mau, Ted
    Wang, Jun
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 254 - 258
  • [37] Refining Unit Boundaries for Mandarin Text-to-Speech Database
    Dong, Minghui
    Cen, Ling
    Chan, Paul
    Li, Haizhou
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 245 - 248
  • [38] Learning Speaker Embedding from Text-to-Speech
    Cho, Jaejin
    Zelasko, Piotr
    Villalba, Jesus
    Watanabe, Shinji
    Dehak, Najim
    INTERSPEECH 2020, 2020, : 3256 - 3260
  • [39] FedSpeech: Federated Text-to-Speech with Continual Learning
    Jiang, Ziyue
    Ren, Yi
    Lei, Ming
    Zhao, Zhou
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3829 - 3835
  • [40] Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech
    Choi, Yeunju
    Jung, Youngmoon
    Suh, Youngjoo
    Kim, Hoirin
    IEEE ACCESS, 2022, 10 : 52621 - 52629