Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring

被引：3

作者：

Zou, Yuxiang ^{[1
,2
]}

Dong, Linhao ^{[1
,2
]}

Xu, Bo ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

关键词：

Chinese speech synthesis; multi-task learning; dictionary tutoring;

D O I：

10.21437/Interspeech.2019-3233

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Recent character-based end-to-end text-to-speech (TTS) systems have shown promising performance in natural speech generation, especially for English. However, for Chinese TTS, the character-based model is easy to generate speech with wrong pronunciation due to the label sparsity issue. To address this issue, we introduce an additional learning task of character-to-pinyin mapping to boost the pronunciation learning of characters, and leverage a pre-trained dictionary network to correct the pronunciation mistake through joint training. Specifically, our model predicts pinyin labels as an auxiliary task to assist learning better hidden representations of Chinese characters, where pinyin is a standard phonetic representation for Chinese characters. The dictionary network plays a role as a tutor to further help hidden representation learning. Experiments demonstrate that employing the pinyin auxiliary task and an external dictionary network clearly enhances the naturalness and intelligibility of the synthetic speech directly from the Chinese character sequences.

引用

页码：2055 / 2059

页数：5

共 50 条

[1] Boosting Character-based Mandarin ASR via Chinese Pinyin Representation
Li L.
Long Y.
Xu D.
Li Y.
International Journal of Speech Technology, 2023, 26 (04) : 895 - 902
[2] Spatially Augmented Speech Bubble to Character Association via Comic Multi-task Learning
Soykan, Gurkan
Yuret, Deniz
Sezgin, Tevfik Metin
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT I, 2024, 14935 : 231 - 256
[3] Online Multi-Task Learning via Sparse Dictionary Optimization
Ruvolo, Paul
Eaton, Eric
PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 2062 - 2068
[4] Speech Emotion Recognition based on Multi-Task Learning
Zhao, Huijuan
Han Zhijie
Wang, Ruchuan
2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
[5] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
Hsu, Jia-Hao
Wu, Chung-Hsien
Wei, Yu-Hung
INTERSPEECH 2023, 2023, : 4553 - 4557
[6] Boosting Share Routing for Multi-task Learning
Chen, Xiaokai
Gu, Xiaoguang
Fu, Libo
WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 372 - 379
[7] A multi-task transfer learning method with dictionary learning
Zheng, Xin
Lin, Luyue
Liu, Bo
Xiao, Yanshan
Xiong, Xiaoming
KNOWLEDGE-BASED SYSTEMS, 2020, 191
[8] Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
Toth, Laszlo
Gosztolya, Gabor
Grosz, Tamas
Marko, Alexandra
Csapo, Tamas Gabor
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3172 - 3176
[9] Multi-Task Chinese Speech Recognition Method Based on the Squeezeformer Model
Guo, Ying
Wang, Li
IAENG International Journal of Computer Science, 2025, 52 (01) : 23 - 31
[10] Speech Emotion Recognition with Multi-task Learning
Cai, Xingyu
Yuan, Jiahong
Zheng, Renjie
Huang, Liang
Church, Kenneth
INTERSPEECH 2021, 2021, : 4508 - 4512

← 1 2 3 4 5 →