Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring

被引:3
|
作者
Zou, Yuxiang [1 ,2 ]
Dong, Linhao [1 ,2 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
关键词
Chinese speech synthesis; multi-task learning; dictionary tutoring;
D O I
10.21437/Interspeech.2019-3233
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recent character-based end-to-end text-to-speech (TTS) systems have shown promising performance in natural speech generation, especially for English. However, for Chinese TTS, the character-based model is easy to generate speech with wrong pronunciation due to the label sparsity issue. To address this issue, we introduce an additional learning task of character-to-pinyin mapping to boost the pronunciation learning of characters, and leverage a pre-trained dictionary network to correct the pronunciation mistake through joint training. Specifically, our model predicts pinyin labels as an auxiliary task to assist learning better hidden representations of Chinese characters, where pinyin is a standard phonetic representation for Chinese characters. The dictionary network plays a role as a tutor to further help hidden representation learning. Experiments demonstrate that employing the pinyin auxiliary task and an external dictionary network clearly enhances the naturalness and intelligibility of the synthetic speech directly from the Chinese character sequences.
引用
收藏
页码:2055 / 2059
页数:5
相关论文
共 50 条
  • [31] HIERARCHICAL MULTI-TASK LEARNING VIA TASK AFFINITY GROUPINGS
    Srivastava, Siddharth
    Bhugra, Swati
    Kaushik, Vinay
    Lall, Brejesh
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3289 - 3293
  • [32] Poster Abstract: Speech Emotion Recognition via Attention-based DNN from Multi-Task Learning
    Ma, Fei
    Gu, Weixi
    Zhang, Wei
    Ni, Shiguang
    Huang, Shao-Lun
    Zhang, Lin
    SENSYS'18: PROCEEDINGS OF THE 16TH CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, 2018, : 363 - 364
  • [33] MULTI-OBJECTIVE MULTI-TASK LEARNING ON RNNLM FOR SPEECH RECOGNITION
    Song, Minguang
    Zhao, Yunxin
    Wang, Shaojun
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 197 - 203
  • [34] Multi-task analysis discriminative dictionary learning for one-class learning
    Liu, Bo
    Xie, Haoxin
    Xiao, Yanshan
    KNOWLEDGE-BASED SYSTEMS, 2021, 227
  • [35] Event Detection via Context Understanding Based on Multi-task Learning
    Xia, Jing
    Li, Xiaolong
    Tan, Yongbin
    Zhang, Wu
    Li, Dajun
    Xiong, Zhengkun
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (01)
  • [36] MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
    Ghosh, Sreyan
    Tyagi, Utkarsh
    Ramaneswaran, S.
    Srivastava, Harshvardhan
    Manocha, Dinesh
    INTERSPEECH 2023, 2023, : 1209 - 1213
  • [37] Boosting Multi-task Learning Through Combination of Task Labels - with Applications in ECG Phenotyping
    Hsieh, Ming-En
    Tseng, Vincent S.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7771 - 7779
  • [38] Fuzzy Multi-task Learning for Hate Speech Type Identification
    Liu, Han
    Burnap, Pete
    Alorainy, Wafa
    Williams, Matthew L.
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 3006 - 3012
  • [39] Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning
    Zhengqi Wen
    Kehuang Li
    Zhen Huang
    Chin-Hui Lee
    Jianhua Tao
    Journal of Signal Processing Systems, 2018, 90 : 1025 - 1037
  • [40] A multi-task learning speech synthesis optimization method based on CWT: a case study of Tacotron2
    Guoqiang Hu
    Zhuofan Ruan
    Wenqiu Guo
    Yujuan Quan
    EURASIP Journal on Advances in Signal Processing, 2024