TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING

被引：0

作者：

Renovalles, Edsel Jedd ^{[1
]}

Lucas, Crisron Rudolf ^{[1
]}

de Leon, Franz ^{[1
]}

Aquino, Angelina ^{[1
]}

Jalandoni, Izza ^{[1
]}

机构：

[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines

来源：

2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2021年

关键词：

data augmentation; deep learning; text-to-speech; unit selection; voice conversion;

D O I：

10.1109/O-COCOSDA202152914.2021.9660431

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

There are several text-to-speech (TTS) systems developed and published for Philippine languages, particularly in Filipino and Cebuano. However, there is still a need to improve the performance of existing systems. Due to the limited amount of linguistic resources and lack of speech data available for Philippine languages, developing a reliable TTS system to support these languages becomes difficult. In this paper, we implement and evaluate the performance of two TTS systems for Filipino. We implemented two methods: unit selection using MaryTTS and deep learning approach using Tacotron-2. We tried applying modification on the F0 contour and duration of the unit selection system and used voice conversion to augment the training data of Tacotron-2. The unit selection system achieved a mean opinion score (MOS) of 3.05, however, the boundary-based F0 modification yielded perceivable distortions in the output and requires more enhancements to become more effective. On the other hand, the use of voice conversion to transform the original multi-speaker data into single-speaker data and producing more samples for training boosted the Tacotron-2 performance from an overall MOS of 1.51 to 2.01.

引用

页码：212 / 217

页数：6

共 50 条

[31] Speaker-specific retraining for enhanced compression of unit selection text-to-speech databases
Nurminen, Jani
Silen, Hanna
Gabbouj, Moncef
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 388 - 391
[32] RECENT IMPROVEMENTS OF PROBABILITY BASED PROSODY MODELS FOR UNIT SELECTION IN CONCATENATIVE TEXT-TO-SPEECH
Zhang, Wei
Gu, Liang
Gao, Yuqing
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3777 - 3780
[33] Development of Assamese Text-to-speech System using Deep Neural Network
Deka, Abhash
Sarmah, Priyankoo
Samudravijaya, K.
Prasanna, S. R. M.
2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
[34] MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language
Mishev, Kostadin
Karovska Ristovska, Aleksandra
Trajanov, Dimitar
Eftimov, Tome
Simjanoska, Monika
APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 14
[35] Planning the development of text-to-speech synthesis models and datasets with dynamic deep learning
Ahmad, Hawraz A.
Rashid, Tarik A.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)
[36] Integrating Articulatory Information in Deep Learning-based Text-to-Speech Synthesis
Cao, Beiming
Kim, Myungjong
van Santen, Jan
Mau, Ted
Wang, Jun
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 254 - 258
[37] Refining Unit Boundaries for Mandarin Text-to-Speech Database
Dong, Minghui
Cen, Ling
Chan, Paul
Li, Haizhou
2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 245 - 248
[38] Learning Speaker Embedding from Text-to-Speech
Cho, Jaejin
Zelasko, Piotr
Villalba, Jesus
Watanabe, Shinji
Dehak, Najim
INTERSPEECH 2020, 2020, : 3256 - 3260
[39] FedSpeech: Federated Text-to-Speech with Continual Learning
Jiang, Ziyue
Ren, Yi
Lei, Ming
Zhao, Zhou
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3829 - 3835
[40] Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech
Choi, Yeunju
Jung, Youngmoon
Suh, Youngjoo
Kim, Hoirin
IEEE ACCESS, 2022, 10 : 52621 - 52629

← 1 2 3 4 5 →