TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING

被引:0
|
作者
Renovalles, Edsel Jedd [1 ]
Lucas, Crisron Rudolf [1 ]
de Leon, Franz [1 ]
Aquino, Angelina [1 ]
Jalandoni, Izza [1 ]
机构
[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines
来源
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2021年
关键词
data augmentation; deep learning; text-to-speech; unit selection; voice conversion;
D O I
10.1109/O-COCOSDA202152914.2021.9660431
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
There are several text-to-speech (TTS) systems developed and published for Philippine languages, particularly in Filipino and Cebuano. However, there is still a need to improve the performance of existing systems. Due to the limited amount of linguistic resources and lack of speech data available for Philippine languages, developing a reliable TTS system to support these languages becomes difficult. In this paper, we implement and evaluate the performance of two TTS systems for Filipino. We implemented two methods: unit selection using MaryTTS and deep learning approach using Tacotron-2. We tried applying modification on the F0 contour and duration of the unit selection system and used voice conversion to augment the training data of Tacotron-2. The unit selection system achieved a mean opinion score (MOS) of 3.05, however, the boundary-based F0 modification yielded perceivable distortions in the output and requires more enhancements to become more effective. On the other hand, the use of voice conversion to transform the original multi-speaker data into single-speaker data and producing more samples for training boosted the Tacotron-2 performance from an overall MOS of 1.51 to 2.01.
引用
收藏
页码:212 / 217
页数:6
相关论文
共 50 条
  • [41] Automatic Syllabification for Danish Text-to-Speech Systems
    Beck, Jeppe
    Braga, Daniela
    Nogueira, Joao
    Dias, Miguel Sales
    Coelho, Luis
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1291 - 1294
  • [42] A Comparative Study of Text-to-Speech Systems in LabVIEW
    Panoiu, Manuela
    Rat, Cezara-Liliana
    Panoiu, Caius
    SOFT COMPUTING APPLICATIONS, (SOFA 2014), VOL 1, 2016, 356 : 3 - 11
  • [43] Enhancing the Quality of Nepali Text-to-Speech Systems
    Ghimire, Rupak Raj
    Bal, Bal Krishna
    CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
  • [44] Method of intelligibility testing for text-to-speech systems
    Sheffield, E
    Polizzi, P
    PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : A862 - A865
  • [45] A study of prosodic variability methods in a corpus-based unit selection text-to-speech system
    Csapo, Tamas Gabor
    Zainko, Csaba
    Nemeth, Geza
    INFOCOMMUNICATIONS JOURNAL, 2010, 2 (01): : 32 - 37
  • [46] Extracting user preferences by GTM for aiGA weight tuning in unit selection text-to-speech synthesis
    Formiga, Lluis
    Alias, Francese
    COMPUTATIONAL AND AMBIENT INTELLIGENCE, 2007, 4507 : 654 - +
  • [47] SELECTION OF A FORMANT SYNTHESIZER MODEL FOR TEXT-TO-SPEECH SYNTHESIS
    SINCLAIR, DA
    PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 363 - 369
  • [48] CONTROLLING PHONEME SYNTHESIZERS IN TEXT-TO-SPEECH SYSTEMS
    RUHL, HW
    DREISSIG, D
    KULAS, W
    NTZ ARCHIV, 1984, 6 (10): : 243 - 248
  • [49] Perceptual Quality Dimensions of Text-to-Speech Systems
    Hinterleitner, Florian
    Moeller, Sebastian
    Norrenbrock, Christoph
    Heute, Ulrich
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
  • [50] Duration analysis for malayalam text-to-speech systems
    Gopinath, Deepa P.
    Divya, Sree J.
    Mathew, Reshmi
    Rekhila, S. J.
    Nair, Achuthsankar S.
    ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 129 - +