TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING

被引:0
|
作者
Renovalles, Edsel Jedd [1 ]
Lucas, Crisron Rudolf [1 ]
de Leon, Franz [1 ]
Aquino, Angelina [1 ]
Jalandoni, Izza [1 ]
机构
[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines
关键词
data augmentation; deep learning; text-to-speech; unit selection; voice conversion;
D O I
10.1109/O-COCOSDA202152914.2021.9660431
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
There are several text-to-speech (TTS) systems developed and published for Philippine languages, particularly in Filipino and Cebuano. However, there is still a need to improve the performance of existing systems. Due to the limited amount of linguistic resources and lack of speech data available for Philippine languages, developing a reliable TTS system to support these languages becomes difficult. In this paper, we implement and evaluate the performance of two TTS systems for Filipino. We implemented two methods: unit selection using MaryTTS and deep learning approach using Tacotron-2. We tried applying modification on the F0 contour and duration of the unit selection system and used voice conversion to augment the training data of Tacotron-2. The unit selection system achieved a mean opinion score (MOS) of 3.05, however, the boundary-based F0 modification yielded perceivable distortions in the output and requires more enhancements to become more effective. On the other hand, the use of voice conversion to transform the original multi-speaker data into single-speaker data and producing more samples for training boosted the Tacotron-2 performance from an overall MOS of 1.51 to 2.01.
引用
收藏
页码:212 / 217
页数:6
相关论文
共 50 条
  • [21] A text analyzer for Korean text-to-speech systems
    Lee, SH
    Oh, YH
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1692 - 1695
  • [22] Applying Scalable Phonetic Context Similarity in Unit Selection of Concatenative Text-to-Speech
    Zhang, Wei
    Cui, Xiaodong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 154 - 157
  • [23] Scalable implementation of unit selection based text-to-speech system for embedded solutions
    Nukaga, Nobuo
    Kamoshida, Ryota
    Nagamatsu, Kenji
    Kitahara, Yoshinori
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 849 - 852
  • [24] Unit-centric feature mapping for inventory pruning in unit selection text-to-speech synthesis
    Bellegarda, Jerome R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 74 - 82
  • [25] Reducing footprint of unit selection based text-to-speech system using compressed sensing and sparse representation
    Sharma, Pulkit
    Abrol, Vinayak
    Nivedita
    Sao, Anil Kumar
    COMPUTER SPEECH AND LANGUAGE, 2018, 52 : 191 - 208
  • [26] Bangla text normalization for text-to-speech synthesizer using machine learning algorithms
    Islam, Md. Rezaul
    Ahmad, Arif
    Rahman, Mohammad Shahidur
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (01)
  • [27] The use of lexica in text-to-speech systems
    Quazza, S
    Van den Heuvel, H
    LEXICON DEVELOPMENT FOR SPEECH AND LANGUAGE PROCESSING, 2000, 12 : 207 - 233
  • [28] EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH
    Kim, Tae-Ho
    Cho, Sungjae
    Choi, Shinkook
    Park, Sejik
    Lee, Soo-Young
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7774 - 7778
  • [29] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
    Bollepalli, Bajibabu
    Juvela, Lauri
    Alku, Paavo
    INTERSPEECH 2019, 2019, : 2833 - 2837
  • [30] A Small Footprint Hybrid Statistical and Unit Selection Text-to-Speech Synthesis System for Turkish
    Guner, Ekrem
    Demiroglu, Cenk
    COMPUTER AND INFORMATION SCIENCES II, 2012, : 85 - 91