Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training

被引：8

作者：

Liu, Rui ^{[1
]}

Hu, Yifan ^{[1
]}

Zuo, Haolin ^{[1
]}

Luo, Zhaojie ^{[2
]}

Wang, Longbiao ^{[3
]}

Gao, Guanglai ^{[1
]}

机构：

[1] Inner Mongolia Univ, Dept Comp Sci, Hohhot 010021, Peoples R China

[2] Osaka Univ, SANKEN, Osaka 5670047, Japan

[3] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin 300072, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Text-to-speech (TTS); agglutinative; morphology; language modeling; pre-training; END;

D O I：

10.1109/TASLP.2023.3348762

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Text-to-Speech (TTS) aims to convert the input text to a human-like voice. With the development of deep learning, encoder-decoder based TTS models perform superior performance, in terms of naturalness, in mainstream languages such as Chinese, English, etc. Note that the linguistic information learning capability of the text encoder is the key. However, for TTS of low-resource agglutinative languages, the scale of the <text, speech> paired data is limited. Therefore, how to extract rich linguistic information from small-scale text data to enhance the naturalness of the synthesized speech, is an urgent issue that needs to be addressed. In this paper, we first collect a large unsupervised text data for BERT-like language model pre-training, and then adopt the trained language model to extract deep linguistic information for the input text of the TTS model to improve the naturalness of the final synthesized speech. It should be emphasized that in order to fully exploit the prosody-related linguistic information in agglutinative languages, we incorporated morphological information into the language model training and constructed a morphology-aware masking based BERT model (MAM-BERT). Experimental results based on various advanced TTS models validate the effectiveness of our approach. Further comparison of the various data scales also validates the effectiveness of our approach in low-resource scenarios.

引用

页码：1075 / 1087

页数：13

共 50 条

[21] Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages
Ziyaden, Atabay
Yelenov, Amir
Hajiyev, Fuad
Rustamov, Samir
Pak, Alexandr
PEERJ COMPUTER SCIENCE, 2024, 10
[22] Low-Resource Neural Machine Translation Using XLNet Pre-training Model
Wu, Nier
Hou, Hongxu
Guo, Ziyue
Zheng, Wei
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 503 - 514
[23] SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding
Chung, Yu-An
Zhu, Chenguang
Zeng, Michael
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1897 - 1907
[24] Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training
Cao, Yichao
Li, Miao
Feng, Tao
Wang, Rujing
CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019, 2019, 11856 : 321 - 333
[25] Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
Gao, Luyu
Callan, Jamie
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2843 - 2853
[26] Subsampling of Frequent Words in Text for Pre-training a Vision-Language Model
Liang, Mingliang
Larson, Martha
PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 61 - 67
[27] Speech Model Pre-training for End-to-End Spoken Language Understanding
Lugosch, Loren
Ravanelli, Mirco
Ignoto, Patrick
Tomar, Vikrant Singh
Bengio, Yoshua
INTERSPEECH 2019, 2019, : 814 - 818
[28] Hybrid Approach Text Generation for Low-Resource Language
Rakhimova, Diana
Adali, Esref
Karibayeva, Aidana
ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 256 - 268
[29] Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition
Zhang, Zhilong
Wang, Wei
Qian, Yanmin
INTERSPEECH 2023, 2023, : 2248 - 2252
[30] Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization
Chernyshev, Daniil
Dobrov, Boris
IEEE ACCESS, 2024, 12 : 47219 - 47230

← 1 2 3 4 5 →