Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring

被引:3
|
作者
Zou, Yuxiang [1 ,2 ]
Dong, Linhao [1 ,2 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
关键词
Chinese speech synthesis; multi-task learning; dictionary tutoring;
D O I
10.21437/Interspeech.2019-3233
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recent character-based end-to-end text-to-speech (TTS) systems have shown promising performance in natural speech generation, especially for English. However, for Chinese TTS, the character-based model is easy to generate speech with wrong pronunciation due to the label sparsity issue. To address this issue, we introduce an additional learning task of character-to-pinyin mapping to boost the pronunciation learning of characters, and leverage a pre-trained dictionary network to correct the pronunciation mistake through joint training. Specifically, our model predicts pinyin labels as an auxiliary task to assist learning better hidden representations of Chinese characters, where pinyin is a standard phonetic representation for Chinese characters. The dictionary network plays a role as a tutor to further help hidden representation learning. Experiments demonstrate that employing the pinyin auxiliary task and an external dictionary network clearly enhances the naturalness and intelligibility of the synthetic speech directly from the Chinese character sequences.
引用
收藏
页码:2055 / 2059
页数:5
相关论文
共 50 条
  • [1] Boosting Character-based Mandarin ASR via Chinese Pinyin Representation
    Li L.
    Long Y.
    Xu D.
    Li Y.
    International Journal of Speech Technology, 2023, 26 (04) : 895 - 902
  • [2] Spatially Augmented Speech Bubble to Character Association via Comic Multi-task Learning
    Soykan, Gurkan
    Yuret, Deniz
    Sezgin, Tevfik Metin
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT I, 2024, 14935 : 231 - 256
  • [3] Online Multi-Task Learning via Sparse Dictionary Optimization
    Ruvolo, Paul
    Eaton, Eric
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 2062 - 2068
  • [4] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [5] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
    Hsu, Jia-Hao
    Wu, Chung-Hsien
    Wei, Yu-Hung
    INTERSPEECH 2023, 2023, : 4553 - 4557
  • [6] Boosting Share Routing for Multi-task Learning
    Chen, Xiaokai
    Gu, Xiaoguang
    Fu, Libo
    WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 372 - 379
  • [7] A multi-task transfer learning method with dictionary learning
    Zheng, Xin
    Lin, Luyue
    Liu, Bo
    Xiao, Yanshan
    Xiong, Xiaoming
    KNOWLEDGE-BASED SYSTEMS, 2020, 191
  • [8] Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
    Toth, Laszlo
    Gosztolya, Gabor
    Grosz, Tamas
    Marko, Alexandra
    Csapo, Tamas Gabor
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3172 - 3176
  • [9] Multi-Task Chinese Speech Recognition Method Based on the Squeezeformer Model
    Guo, Ying
    Wang, Li
    IAENG International Journal of Computer Science, 2025, 52 (01) : 23 - 31
  • [10] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    INTERSPEECH 2021, 2021, : 4508 - 4512