TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING

被引：0

作者：

Renovalles, Edsel Jedd ^{[1
]}

Lucas, Crisron Rudolf ^{[1
]}

de Leon, Franz ^{[1
]}

Aquino, Angelina ^{[1
]}

Jalandoni, Izza ^{[1
]}

机构：

[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines

来源：

2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2021年

关键词：

data augmentation; deep learning; text-to-speech; unit selection; voice conversion;

D O I：

10.1109/O-COCOSDA202152914.2021.9660431

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

There are several text-to-speech (TTS) systems developed and published for Philippine languages, particularly in Filipino and Cebuano. However, there is still a need to improve the performance of existing systems. Due to the limited amount of linguistic resources and lack of speech data available for Philippine languages, developing a reliable TTS system to support these languages becomes difficult. In this paper, we implement and evaluate the performance of two TTS systems for Filipino. We implemented two methods: unit selection using MaryTTS and deep learning approach using Tacotron-2. We tried applying modification on the F0 contour and duration of the unit selection system and used voice conversion to augment the training data of Tacotron-2. The unit selection system achieved a mean opinion score (MOS) of 3.05, however, the boundary-based F0 modification yielded perceivable distortions in the output and requires more enhancements to become more effective. On the other hand, the use of voice conversion to transform the original multi-speaker data into single-speaker data and producing more samples for training boosted the Tacotron-2 performance from an overall MOS of 1.51 to 2.01.

引用

页码：212 / 217

页数：6

共 50 条

[21] A text analyzer for Korean text-to-speech systems
Lee, SH
Oh, YH
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1692 - 1695
[22] Applying Scalable Phonetic Context Similarity in Unit Selection of Concatenative Text-to-Speech
Zhang, Wei
Cui, Xiaodong
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 154 - 157
[23] Scalable implementation of unit selection based text-to-speech system for embedded solutions
Nukaga, Nobuo
Kamoshida, Ryota
Nagamatsu, Kenji
Kitahara, Yoshinori
2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 849 - 852
[24] Unit-centric feature mapping for inventory pruning in unit selection text-to-speech synthesis
Bellegarda, Jerome R.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 74 - 82
[25] Reducing footprint of unit selection based text-to-speech system using compressed sensing and sparse representation
Sharma, Pulkit
Abrol, Vinayak
Nivedita
Sao, Anil Kumar
COMPUTER SPEECH AND LANGUAGE, 2018, 52 : 191 - 208
[26] Bangla text normalization for text-to-speech synthesizer using machine learning algorithms
Islam, Md. Rezaul
Ahmad, Arif
Rahman, Mohammad Shahidur
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (01)
[27] The use of lexica in text-to-speech systems
Quazza, S
Van den Heuvel, H
LEXICON DEVELOPMENT FOR SPEECH AND LANGUAGE PROCESSING, 2000, 12 : 207 - 233
[28] EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH
Kim, Tae-Ho
Cho, Sungjae
Choi, Shinkook
Park, Sejik
Lee, Soo-Young
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7774 - 7778
[29] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
Bollepalli, Bajibabu
Juvela, Lauri
Alku, Paavo
INTERSPEECH 2019, 2019, : 2833 - 2837
[30] A Small Footprint Hybrid Statistical and Unit Selection Text-to-Speech Synthesis System for Turkish
Guner, Ekrem
Demiroglu, Cenk
COMPUTER AND INFORMATION SCIENCES II, 2012, : 85 - 91

← 1 2 3 4 5 →