Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring

被引:3
|
作者
Zou, Yuxiang [1 ,2 ]
Dong, Linhao [1 ,2 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
关键词
Chinese speech synthesis; multi-task learning; dictionary tutoring;
D O I
10.21437/Interspeech.2019-3233
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recent character-based end-to-end text-to-speech (TTS) systems have shown promising performance in natural speech generation, especially for English. However, for Chinese TTS, the character-based model is easy to generate speech with wrong pronunciation due to the label sparsity issue. To address this issue, we introduce an additional learning task of character-to-pinyin mapping to boost the pronunciation learning of characters, and leverage a pre-trained dictionary network to correct the pronunciation mistake through joint training. Specifically, our model predicts pinyin labels as an auxiliary task to assist learning better hidden representations of Chinese characters, where pinyin is a standard phonetic representation for Chinese characters. The dictionary network plays a role as a tutor to further help hidden representation learning. Experiments demonstrate that employing the pinyin auxiliary task and an external dictionary network clearly enhances the naturalness and intelligibility of the synthetic speech directly from the Chinese character sequences.
引用
收藏
页码:2055 / 2059
页数:5
相关论文
共 50 条
  • [21] Meta Multi-task Learning for Speech Emotion Recognition
    Cai, Ruichu
    Guo, Kaibin
    Xu, Boyan
    Yang, Xiaoyan
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 3336 - 3340
  • [22] Adaptive multi-task learning for speech to text translation
    Feng, Xin
    Zhao, Yue
    Zong, Wei
    Xu, Xiaona
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [23] Chinese Named Entity Recognition Model Based on Multi-Task Learning
    Fang, Qin
    Li, Yane
    Feng, Hailin
    Ruan, Yaoping
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [24] MASS: Multi-task anthropomorphic speech synthesis framework
    Chen, Jinyin
    Ye, Linhui
    Ming, Zhaoyan
    COMPUTER SPEECH AND LANGUAGE, 2021, 70
  • [25] A novel boosting algorithm for multi-task learning based on the Itakuda-Saito divergence
    Takenouchi, Takashi
    Komori, Osamu
    Eguchi, Shinto
    BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING (MAXENT 2014), 2015, 1641 : 230 - 237
  • [26] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
    Zhang, Yu
    Zhang, Pengyuan
    Yan, Yonghong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
  • [27] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao Huijuan
    Ye Ning
    Wang Ruchuan
    Journal of Signal Processing Systems, 2021, 93 : 299 - 308
  • [28] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao, Huijuan
    Ye, Ning
    Wang, Ruchuan
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (2-3): : 299 - 308
  • [29] Multi-task Joint Sparse Representation Classification Based on Fisher Discrimination Dictionary Learning
    Wang, Rui
    Shen, Miaomiao
    Li, Yanping
    Gomes, Samuel
    CMC-COMPUTERS MATERIALS & CONTINUA, 2018, 57 (01): : 25 - 48
  • [30] WAVELET-BASED DECOMPOSITION OF F0 AS A SECONDARY TASK FOR DNN-BASED SPEECH SYNTHESIS WITH MULTI-TASK LEARNING
    Ribeiro, Manuel Sam
    Watts, Oliver
    Yamagishi, Junichi
    Clark, Robert A. J.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5525 - 5529