Low-Resource Speech Synthesis with Speaker-Aware Embedding

被引：4

作者：

Yang, Li-Jen ^{[1
]}

Yeh, I-Ping ^{[2
]}

Chien, Jen-Tzung ^{[1
]}

机构：

[1] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu, Taiwan

[2] Natl Yang Ming Chiao Tung Univ, Grad Degree Program Cybersecur, Hsinchu, Taiwan

来源：

2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年

关键词：

low-resource speech synthesis; speaker-aware embedding; encoder-decoder model; transformer; NETWORKS;

D O I：

10.1109/ISCSLP57327.2022.10038221

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech synthesis has been successfully exploited for mapping from text sequence to speech waveform where high-resource languages have been well studied and learned from a large amount of text-speech paired data in public-domain corpora. However, developing speech synthesis under low-resource languages is challenging for speech communication in local regions since the collection of training data is expensive. In particular, the speaker-aware speech generation under low-resource settings is crucial in real world. Such a problem is increasingly difficult in case of very limited speaker-specific data. This paper presents a speaker-aware speech synthesis under low-resource settings based on an encoder-decoder framework by using transformer. Knowledge transfer is performed by incorporating a speaker-aware embedding through first learning a pretrained transformer from multi-speaker data of a low-populated spoken language and then fine-tuning the transformer to a target speaker with very limited speaker-specific embeddings. Experiments on low-resource Taiwanese speech synthesis are evaluated to show the merit of speaker-aware transformer in terms of Mel cepstral distortion and mean opinion score.

引用

页码：235 / 239

页数：5

共 50 条

[41] SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation
Lim, Seunguook
Kim, Jihie
ALGORITHMS, 2023, 16 (01)
[42] SPEAKER-AWARE TRAINING OF LSTM-RNNS FOR ACOUSTIC MODELLING
Tan, Tian
Qian, Yanmin
Yu, Dong
Kundu, Souvik
Lu, Liang
Sim, Khe Chai
Xiao, Xiong
Zhang, Yu
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5280 - 5284
[43] CML-TTS: A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
Oliveira, Frederico S.
Casanova, Edresson
Candido, Arnaldo, Jr.
Soares, Anderson S.
Galva Filho, Arlindo R.
TEXT, SPEECH, AND DIALOGUE, TSD 2023, 2023, 14102 : 188 - 199
[44] Linguistic Foundations of Low-Resource Languages for Speech Synthesis on the Example of the Kazakh Language
Bekmanova, Gulmira
Yergesh, Banu
Sharipbay, Altynbek
Omarbekova, Assel
Zakirova, Alma
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2022 WORKSHOPS, PART III, 2022, 13379 : 3 - 14
[45] Efficient neural speech synthesis for low-resource languages through multilingual modeling
de Korte, Marcel
Kim, Jaebok
Klabbers, Esther
INTERSPEECH 2020, 2020, : 2967 - 2971
[46] Embedding Articulatory Constraints for Low-resource Speech Recognition Based on Large Pre-trained Model
Lee, Jaeyoung
Mimura, Masato
Kawahara, Tatsuya
INTERSPEECH 2023, 2023, : 1394 - 1398
[47] Who is Speaking? Speaker-Aware Multiparty Dialogue Act Classification
Qamar, Ayesha
Pyarelal, Adarsh
Huang, Ruihong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10122 - 10135
[48] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
Miao, Yajie
Metze, Florian
Rawat, Shourabh
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403
[49] Low-Resource Autodiacritization of Abjads for Speech Keyword Search
Schone, Patrick
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 741 - 744
[50] Speech recognition datasets for low-resource Congolese languages
Kimanuka, Ussen
Maina, Ciira wa
Buyuk, Osman
DATA IN BRIEF, 2024, 52

← 1 2 3 4 5 →