Low-Resource Speech Synthesis with Speaker-Aware Embedding

被引:4
|
作者
Yang, Li-Jen [1 ]
Yeh, I-Ping [2 ]
Chien, Jen-Tzung [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Grad Degree Program Cybersecur, Hsinchu, Taiwan
关键词
low-resource speech synthesis; speaker-aware embedding; encoder-decoder model; transformer; NETWORKS;
D O I
10.1109/ISCSLP57327.2022.10038221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech synthesis has been successfully exploited for mapping from text sequence to speech waveform where high-resource languages have been well studied and learned from a large amount of text-speech paired data in public-domain corpora. However, developing speech synthesis under low-resource languages is challenging for speech communication in local regions since the collection of training data is expensive. In particular, the speaker-aware speech generation under low-resource settings is crucial in real world. Such a problem is increasingly difficult in case of very limited speaker-specific data. This paper presents a speaker-aware speech synthesis under low-resource settings based on an encoder-decoder framework by using transformer. Knowledge transfer is performed by incorporating a speaker-aware embedding through first learning a pretrained transformer from multi-speaker data of a low-populated spoken language and then fine-tuning the transformer to a target speaker with very limited speaker-specific embeddings. Experiments on low-resource Taiwanese speech synthesis are evaluated to show the merit of speaker-aware transformer in terms of Mel cepstral distortion and mean opinion score.
引用
收藏
页码:235 / 239
页数:5
相关论文
共 50 条
  • [41] SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation
    Lim, Seunguook
    Kim, Jihie
    ALGORITHMS, 2023, 16 (01)
  • [42] SPEAKER-AWARE TRAINING OF LSTM-RNNS FOR ACOUSTIC MODELLING
    Tan, Tian
    Qian, Yanmin
    Yu, Dong
    Kundu, Souvik
    Lu, Liang
    Sim, Khe Chai
    Xiao, Xiong
    Zhang, Yu
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5280 - 5284
  • [43] CML-TTS: A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
    Oliveira, Frederico S.
    Casanova, Edresson
    Candido, Arnaldo, Jr.
    Soares, Anderson S.
    Galva Filho, Arlindo R.
    TEXT, SPEECH, AND DIALOGUE, TSD 2023, 2023, 14102 : 188 - 199
  • [44] Linguistic Foundations of Low-Resource Languages for Speech Synthesis on the Example of the Kazakh Language
    Bekmanova, Gulmira
    Yergesh, Banu
    Sharipbay, Altynbek
    Omarbekova, Assel
    Zakirova, Alma
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2022 WORKSHOPS, PART III, 2022, 13379 : 3 - 14
  • [45] Efficient neural speech synthesis for low-resource languages through multilingual modeling
    de Korte, Marcel
    Kim, Jaebok
    Klabbers, Esther
    INTERSPEECH 2020, 2020, : 2967 - 2971
  • [46] Embedding Articulatory Constraints for Low-resource Speech Recognition Based on Large Pre-trained Model
    Lee, Jaeyoung
    Mimura, Masato
    Kawahara, Tatsuya
    INTERSPEECH 2023, 2023, : 1394 - 1398
  • [47] Who is Speaking? Speaker-Aware Multiparty Dialogue Act Classification
    Qamar, Ayesha
    Pyarelal, Adarsh
    Huang, Ruihong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10122 - 10135
  • [48] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Miao, Yajie
    Metze, Florian
    Rawat, Shourabh
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403
  • [49] Low-Resource Autodiacritization of Abjads for Speech Keyword Search
    Schone, Patrick
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 741 - 744
  • [50] Speech recognition datasets for low-resource Congolese languages
    Kimanuka, Ussen
    Maina, Ciira wa
    Buyuk, Osman
    DATA IN BRIEF, 2024, 52