Effective Data Augmentation Methods for Neural Text-to-Speech Systems

被引:0
|
作者
Oh, Suhyeon [1 ]
Kwon, Ohsung [1 ]
Hwang, Min-Jae [1 ]
Kim, Jae-Min [1 ]
Song, Eunwoo [1 ]
机构
[1] NAVER Corp, Seongnam, South Korea
关键词
speech synthesis; self-augmentation; ranking support vector machine;
D O I
10.1109/ICEIC54506.2022.9748515
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes an effective self-augmentation method for improving the quality of neural text-to-speech (TTS) systems. As synthetic speech quality has been greatly improved, creating a neural TTS system using synthetic corpora is now possible. However, whether increasing the amount of synthetic data is always beneficial for improving training efficiency has not been verified. Our aim in this study is to selectively choose synthetic data whose characteristics are close to those of natural speech. Specifically, we adopt a ranking support vector machine (RankSVM) that is well known for effectively ranking relative attributes among binary classes. By setting the synthetic and recorded corpora as two opposite classes, RankSVM is used to determine how the synthesized speech is acoustically similar with the recorded data. As training data can be selectively chosen from large-scale synthetic corpora, the performance of the TTS model re-trained by those data is significantly improved. Subjective evaluation results verify that the proposed TTS model performs much better than the original model trained with recorded data alone and the similarly configured system re-trained with all the synthetic data without any selection method.
引用
收藏
页数:4
相关论文
共 50 条
  • [11] E-TTS: Expressive Text-to-Speech Synthesis for Hindi Using Data Augmentation
    Gupta, Ishika
    Murthy, Hema A.
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 243 - 257
  • [12] Neural networks for text-to-speech phoneme recognition
    Embrechts, MJ
    Arciniegas, F
    SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 3582 - 3587
  • [13] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
    Laptev, Aleksandr
    Korostik, Roman
    Svischev, Aleksey
    Andrusenko, Andrei
    Medennikov, Ivan
    Rybin, Sergey
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
  • [14] CATOTRON - A Neural Text-to-Speech System in Catalan
    Kulebi, Baybars
    Oktem, Alp
    Peiro-Lilja, Alex
    Pascual, Santiago
    Farrus, Mireia
    INTERSPEECH 2020, 2020, : 490 - 491
  • [15] DISTRIBUTION AUGMENTATION FOR LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH
    Lajszczak, Mateusz
    Prasad, Animesh
    van Korlaar, Arent
    Bollepalli, Bajibabu
    Bonafonte, Antonio
    Joly, Arnaud
    Nicolis, Marco
    Moinet, Alexis
    Drugman, Thomas
    Wood, Trevor
    Sokolova, Elena
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8307 - 8311
  • [16] Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems
    Vipperla, Ravichander
    Park, Sangjun
    Choo, Kihyun
    Ishtiaq, Samin
    Min, Kyoungbo
    Bhattacharya, Sourav
    Mehrotra, Abhinav
    Ramos, Alberto Gil C. P.
    Lane, Nicholas D.
    INTERSPEECH 2020, 2020, : 3565 - 3569
  • [17] A NEURAL TEXT-TO-SPEECH MODEL UTILIZING BROADCAST DATA MIXED WITH BACKGROUND MUSIC
    Bae, Hanbin
    Bae, Jae-Sung
    Joo, Young-Sun
    Kim, Young-Ik
    Cho, Hoon-Young
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6603 - 6607
  • [18] Accented Text-to-Speech Synthesis With Limited Data
    Zhou, Xuehao
    Zhang, Mingyang
    Zhou, Yi
    Wu, Zhizheng
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1699 - 1711
  • [19] MOS and pair comparison combined methods for quality evaluation of text-to-speech systems
    Salza, PL
    Foti, E
    Nebbia, L
    Oreglia, M
    ACUSTICA, 1996, 82 (04): : 650 - 656
  • [20] Automatic Syllabification for Danish Text-to-Speech Systems
    Beck, Jeppe
    Braga, Daniela
    Nogueira, Joao
    Dias, Miguel Sales
    Coelho, Luis
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1291 - 1294