TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING

被引：0

作者：

Renovalles, Edsel Jedd ^{[1
]}

Lucas, Crisron Rudolf ^{[1
]}

de Leon, Franz ^{[1
]}

Aquino, Angelina ^{[1
]}

Jalandoni, Izza ^{[1
]}

机构：

[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines

来源：

2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2021年

关键词：

data augmentation; deep learning; text-to-speech; unit selection; voice conversion;

D O I：

10.1109/O-COCOSDA202152914.2021.9660431

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

There are several text-to-speech (TTS) systems developed and published for Philippine languages, particularly in Filipino and Cebuano. However, there is still a need to improve the performance of existing systems. Due to the limited amount of linguistic resources and lack of speech data available for Philippine languages, developing a reliable TTS system to support these languages becomes difficult. In this paper, we implement and evaluate the performance of two TTS systems for Filipino. We implemented two methods: unit selection using MaryTTS and deep learning approach using Tacotron-2. We tried applying modification on the F0 contour and duration of the unit selection system and used voice conversion to augment the training data of Tacotron-2. The unit selection system achieved a mean opinion score (MOS) of 3.05, however, the boundary-based F0 modification yielded perceivable distortions in the output and requires more enhancements to become more effective. On the other hand, the use of voice conversion to transform the original multi-speaker data into single-speaker data and producing more samples for training boosted the Tacotron-2 performance from an overall MOS of 1.51 to 2.01.

引用

页码：212 / 217

页数：6

共 50 条

[41] Automatic Syllabification for Danish Text-to-Speech Systems
Beck, Jeppe
Braga, Daniela
Nogueira, Joao
Dias, Miguel Sales
Coelho, Luis
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1291 - 1294
[42] A Comparative Study of Text-to-Speech Systems in LabVIEW
Panoiu, Manuela
Rat, Cezara-Liliana
Panoiu, Caius
SOFT COMPUTING APPLICATIONS, (SOFA 2014), VOL 1, 2016, 356 : 3 - 11
[43] Enhancing the Quality of Nepali Text-to-Speech Systems
Ghimire, Rupak Raj
Bal, Bal Krishna
CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
[44] Method of intelligibility testing for text-to-speech systems
Sheffield, E
Polizzi, P
PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : A862 - A865
[45] A study of prosodic variability methods in a corpus-based unit selection text-to-speech system
Csapo, Tamas Gabor
Zainko, Csaba
Nemeth, Geza
INFOCOMMUNICATIONS JOURNAL, 2010, 2 (01): : 32 - 37
[46] Extracting user preferences by GTM for aiGA weight tuning in unit selection text-to-speech synthesis
Formiga, Lluis
Alias, Francese
COMPUTATIONAL AND AMBIENT INTELLIGENCE, 2007, 4507 : 654 - +
[47] SELECTION OF A FORMANT SYNTHESIZER MODEL FOR TEXT-TO-SPEECH SYNTHESIS
SINCLAIR, DA
PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 363 - 369
[48] CONTROLLING PHONEME SYNTHESIZERS IN TEXT-TO-SPEECH SYSTEMS
RUHL, HW
DREISSIG, D
KULAS, W
NTZ ARCHIV, 1984, 6 (10): : 243 - 248
[49] Perceptual Quality Dimensions of Text-to-Speech Systems
Hinterleitner, Florian
Moeller, Sebastian
Norrenbrock, Christoph
Heute, Ulrich
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
[50] Duration analysis for malayalam text-to-speech systems
Gopinath, Deepa P.
Divya, Sree J.
Mathew, Reshmi
Rekhila, S. J.
Nair, Achuthsankar S.
ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 129 - +

← 1 2 3 4 5 →