TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING

被引：0

作者：

Renovalles, Edsel Jedd ^{[1
]}

Lucas, Crisron Rudolf ^{[1
]}

de Leon, Franz ^{[1
]}

Aquino, Angelina ^{[1
]}

Jalandoni, Izza ^{[1
]}

机构：

[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines

来源：

2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2021年

关键词：

data augmentation; deep learning; text-to-speech; unit selection; voice conversion;

D O I：

10.1109/O-COCOSDA202152914.2021.9660431

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

There are several text-to-speech (TTS) systems developed and published for Philippine languages, particularly in Filipino and Cebuano. However, there is still a need to improve the performance of existing systems. Due to the limited amount of linguistic resources and lack of speech data available for Philippine languages, developing a reliable TTS system to support these languages becomes difficult. In this paper, we implement and evaluate the performance of two TTS systems for Filipino. We implemented two methods: unit selection using MaryTTS and deep learning approach using Tacotron-2. We tried applying modification on the F0 contour and duration of the unit selection system and used voice conversion to augment the training data of Tacotron-2. The unit selection system achieved a mean opinion score (MOS) of 3.05, however, the boundary-based F0 modification yielded perceivable distortions in the output and requires more enhancements to become more effective. On the other hand, the use of voice conversion to transform the original multi-speaker data into single-speaker data and producing more samples for training boosted the Tacotron-2 performance from an overall MOS of 1.51 to 2.01.

引用

页码：212 / 217

页数：6

共 50 条

[1] Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System
Capes, Tim
Coles, Paul
Conkie, Alistair
Golipour, Ladan
Hadjitarkhani, Abie
Hu, Qiong
Huddleston, Nancy
Hunt, Melvyn
Li, Jiangchuan
Neeracher, Matthias
Prahallad, Kishore
Raitio, Tuomo
Rasipuram, Ramya
Townsend, Greg
Williamson, Becci
Winarsky, David
Wu, Zhizheng
Zhang, Hepeng
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 4011 - 4015
[2] High quality Arabic text-to-speech synthesis using unit selection
Abdelmalek, Raja
Mnasri, Zied
2016 13TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2016, : 1 - 5
[3] On the Construction of Unit Databanks for Text-to-Speech Systems
Latsch, Vagner L.
Netto, Sergio L.
PROCEEDINGS OF THE IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 2006, : 340 - 343
[4] Efficient Unit-Selection in Text-to-Speech Synthesis
Mihelic, Ales
Gros, Jerneja Zganec
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 411 - 418
[5] An efficient unit-selection method for concatenative Text-to-speech synthesis systems
Gros, Jerneja Zganec
Zganec, Mario
Journal of Computing and Information Technology, 2008, 16 (01) : 69 - 78
[6] Evaluation of Deep Learning Approaches to Text-to-Speech Systems for European Portuguese
Quintas, Sebastiao
Trancoso, Isabel
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 34 - 42
[7] Embedded Unit Selection Text-to-Speech Synthesis for Mobile Devices
Karabetsos, Sotiris
Tsiakoulis, Pirros
Chalamandaris, Aimilios
Raptis, Spyros
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (02) : 613 - 621
[8] An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System
Tsiakoulis, Pirros
Karabetsos, Sotiris
Chalamandaris, Aimilios
Raptis, Spyros
ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 370 - 383
[9] Continuity Metric for Unit Selection based Text-to-Speech Synthesis
Lakkavalli, Vikram Ramesh
Arulmozhi, P.
Ramakrishnan, A. G.
2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,
[10] Globally optimal training of unit boundaries in unit selection text-to-speech synthesis
Bellegarda, Jerome R.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 957 - 965

← 1 2 3 4 5 →