Effective Data Augmentation Methods for Neural Text-to-Speech Systems

被引:0
|
作者
Oh, Suhyeon [1 ]
Kwon, Ohsung [1 ]
Hwang, Min-Jae [1 ]
Kim, Jae-Min [1 ]
Song, Eunwoo [1 ]
机构
[1] NAVER Corp, Seongnam, South Korea
关键词
speech synthesis; self-augmentation; ranking support vector machine;
D O I
10.1109/ICEIC54506.2022.9748515
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes an effective self-augmentation method for improving the quality of neural text-to-speech (TTS) systems. As synthetic speech quality has been greatly improved, creating a neural TTS system using synthetic corpora is now possible. However, whether increasing the amount of synthetic data is always beneficial for improving training efficiency has not been verified. Our aim in this study is to selectively choose synthetic data whose characteristics are close to those of natural speech. Specifically, we adopt a ranking support vector machine (RankSVM) that is well known for effectively ranking relative attributes among binary classes. By setting the synthetic and recorded corpora as two opposite classes, RankSVM is used to determine how the synthesized speech is acoustically similar with the recorded data. As training data can be selectively chosen from large-scale synthetic corpora, the performance of the TTS model re-trained by those data is significantly improved. Subjective evaluation results verify that the proposed TTS model performs much better than the original model trained with recorded data alone and the similarly configured system re-trained with all the synthetic data without any selection method.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] A Comparative Study of Text-to-Speech Systems in LabVIEW
    Panoiu, Manuela
    Rat, Cezara-Liliana
    Panoiu, Caius
    SOFT COMPUTING APPLICATIONS, (SOFA 2014), VOL 1, 2016, 356 : 3 - 11
  • [22] Enhancing the Quality of Nepali Text-to-Speech Systems
    Ghimire, Rupak Raj
    Bal, Bal Krishna
    CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
  • [23] Method of intelligibility testing for text-to-speech systems
    Sheffield, E
    Polizzi, P
    PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : A862 - A865
  • [24] On the Construction of Unit Databanks for Text-to-Speech Systems
    Latsch, Vagner L.
    Netto, Sergio L.
    PROCEEDINGS OF THE IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 2006, : 340 - 343
  • [25] CONTROLLING PHONEME SYNTHESIZERS IN TEXT-TO-SPEECH SYSTEMS
    RUHL, HW
    DREISSIG, D
    KULAS, W
    NTZ ARCHIV, 1984, 6 (10): : 243 - 248
  • [26] Perceptual Quality Dimensions of Text-to-Speech Systems
    Hinterleitner, Florian
    Moeller, Sebastian
    Norrenbrock, Christoph
    Heute, Ulrich
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
  • [27] Duration analysis for malayalam text-to-speech systems
    Gopinath, Deepa P.
    Divya, Sree J.
    Mathew, Reshmi
    Rekhila, S. J.
    Nair, Achuthsankar S.
    ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 129 - +
  • [28] Text-to-speech for low-resource systems
    Schnell, M
    Küstner, M
    Jokisch, O
    Hoffmann, R
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2002, : 259 - 262
  • [29] Decoding Knowledge Transfer for Neural Text-to-Speech Training
    Liu, Rui
    Sisman, Berrak
    Gao, Guanglai
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1789 - 1802
  • [30] Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems
    Ayllon, David
    Sanchez-Hevia, Hector A.
    Figueroa, Carol
    Lanchantin, Pierre
    INTERSPEECH 2019, 2019, : 1511 - 1515