Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring

被引:3
|
作者
Zou, Yuxiang [1 ,2 ]
Dong, Linhao [1 ,2 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
关键词
Chinese speech synthesis; multi-task learning; dictionary tutoring;
D O I
10.21437/Interspeech.2019-3233
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recent character-based end-to-end text-to-speech (TTS) systems have shown promising performance in natural speech generation, especially for English. However, for Chinese TTS, the character-based model is easy to generate speech with wrong pronunciation due to the label sparsity issue. To address this issue, we introduce an additional learning task of character-to-pinyin mapping to boost the pronunciation learning of characters, and leverage a pre-trained dictionary network to correct the pronunciation mistake through joint training. Specifically, our model predicts pinyin labels as an auxiliary task to assist learning better hidden representations of Chinese characters, where pinyin is a standard phonetic representation for Chinese characters. The dictionary network plays a role as a tutor to further help hidden representation learning. Experiments demonstrate that employing the pinyin auxiliary task and an external dictionary network clearly enhances the naturalness and intelligibility of the synthetic speech directly from the Chinese character sequences.
引用
收藏
页码:2055 / 2059
页数:5
相关论文
共 50 条
  • [41] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
    Yue, Pengcheng
    Qu, Leyuan
    Zheng, Shukai
    Li, Taihao
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
  • [42] Multi-step Forecasting via Multi-task Learning
    Jawed, Shayan
    Rashed, Ahmed
    Schmidt-Thieme, Lars
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 790 - 799
  • [43] Image Recognition of Chinese herbal pieces Based on Multi-task Learning Model
    Hu, Ji-Li
    Wang, Yong-Kang
    Che, Zeng-Yang
    Li, Qian-Qian
    Jiang, Hong-Kun
    Liu, Ling-Jie
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1555 - 1559
  • [44] A multi-task learning speech synthesis optimization method based on CWT: a case study of Tacotron2
    Hu, Guoqiang
    Ruan, Zhuofan
    Guo, Wenqiu
    Quan, Yujuan
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2024, 2024 (01)
  • [45] Acoustic-to-articulatory Speech Inversion with Multi-task Learning
    Siriwardena, Yashish M.
    Sivaraman, Ganesh
    Espy-Wilson, Carol
    INTERSPEECH 2022, 2022, : 5020 - 5024
  • [46] Multi-task hybrid dictionary learning for vehicle classification in sensor networks
    Wang, Rui
    Shen, Miaomiao
    Wang, Tao
    Cao, Wenming
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2018, 14 (11):
  • [47] Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning
    Wen, Zhengqi
    Li, Kehuang
    Huang, Zhen
    Lee, Chin-Hui
    Tao, Jianhua
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 1025 - 1037
  • [48] Application of Knowledge Distillation to Multi-Task Speech Representation Learning
    Kerpicci, Mine
    Van Nguyen
    Zhang, Shuhua
    Visser, Erik
    INTERSPEECH 2023, 2023, : 2813 - 2817
  • [49] Adversarial multi-task learning with inverse mapping for speech enhancement
    Qiu, Yuanhang
    Wang, Ruili
    Hou, Feng
    Singh, Satwinder
    Ma, Zhizhong
    Jia, Xiaoyun
    APPLIED SOFT COMPUTING, 2022, 120
  • [50] Recognizing Chinese Medical Literature Entities Based on Multi-Task and Transfer Learning
    Han P.
    Gu L.
    Ye D.
    Chen W.
    Data Analysis and Knowledge Discovery, 2023, 7 (09) : 136 - 145