TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING

被引:0
|
作者
Renovalles, Edsel Jedd [1 ]
Lucas, Crisron Rudolf [1 ]
de Leon, Franz [1 ]
Aquino, Angelina [1 ]
Jalandoni, Izza [1 ]
机构
[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines
关键词
data augmentation; deep learning; text-to-speech; unit selection; voice conversion;
D O I
10.1109/O-COCOSDA202152914.2021.9660431
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
There are several text-to-speech (TTS) systems developed and published for Philippine languages, particularly in Filipino and Cebuano. However, there is still a need to improve the performance of existing systems. Due to the limited amount of linguistic resources and lack of speech data available for Philippine languages, developing a reliable TTS system to support these languages becomes difficult. In this paper, we implement and evaluate the performance of two TTS systems for Filipino. We implemented two methods: unit selection using MaryTTS and deep learning approach using Tacotron-2. We tried applying modification on the F0 contour and duration of the unit selection system and used voice conversion to augment the training data of Tacotron-2. The unit selection system achieved a mean opinion score (MOS) of 3.05, however, the boundary-based F0 modification yielded perceivable distortions in the output and requires more enhancements to become more effective. On the other hand, the use of voice conversion to transform the original multi-speaker data into single-speaker data and producing more samples for training boosted the Tacotron-2 performance from an overall MOS of 1.51 to 2.01.
引用
收藏
页码:212 / 217
页数:6
相关论文
共 50 条
  • [1] Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System
    Capes, Tim
    Coles, Paul
    Conkie, Alistair
    Golipour, Ladan
    Hadjitarkhani, Abie
    Hu, Qiong
    Huddleston, Nancy
    Hunt, Melvyn
    Li, Jiangchuan
    Neeracher, Matthias
    Prahallad, Kishore
    Raitio, Tuomo
    Rasipuram, Ramya
    Townsend, Greg
    Williamson, Becci
    Winarsky, David
    Wu, Zhizheng
    Zhang, Hepeng
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 4011 - 4015
  • [2] High quality Arabic text-to-speech synthesis using unit selection
    Abdelmalek, Raja
    Mnasri, Zied
    2016 13TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2016, : 1 - 5
  • [3] On the Construction of Unit Databanks for Text-to-Speech Systems
    Latsch, Vagner L.
    Netto, Sergio L.
    PROCEEDINGS OF THE IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 2006, : 340 - 343
  • [4] Efficient Unit-Selection in Text-to-Speech Synthesis
    Mihelic, Ales
    Gros, Jerneja Zganec
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 411 - 418
  • [5] An efficient unit-selection method for concatenative Text-to-speech synthesis systems
    Gros, Jerneja Zganec
    Zganec, Mario
    Journal of Computing and Information Technology, 2008, 16 (01) : 69 - 78
  • [6] Evaluation of Deep Learning Approaches to Text-to-Speech Systems for European Portuguese
    Quintas, Sebastiao
    Trancoso, Isabel
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 34 - 42
  • [7] Embedded Unit Selection Text-to-Speech Synthesis for Mobile Devices
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Chalamandaris, Aimilios
    Raptis, Spyros
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (02) : 613 - 621
  • [8] An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System
    Tsiakoulis, Pirros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Raptis, Spyros
    ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 370 - 383
  • [9] Continuity Metric for Unit Selection based Text-to-Speech Synthesis
    Lakkavalli, Vikram Ramesh
    Arulmozhi, P.
    Ramakrishnan, A. G.
    2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,
  • [10] Globally optimal training of unit boundaries in unit selection text-to-speech synthesis
    Bellegarda, Jerome R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 957 - 965