Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis

被引:1
|
作者
Peng, Yukun [1 ]
Ling, Zhenhua [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China
来源
基金
国家重点研发计划;
关键词
text-to-speech; speech synthesis; multilingual; meta-learning;
D O I
10.21437/Interspeech.2022-831
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a method of decoupled pronunciation and prosody modeling to improve the performance of meta-learning-based multilingual speech synthesis. The baseline meta-learning synthesis method adopts a single text encoder with a parameter generator conditioned on language embeddings and a single decoder to predict mel-spectrograms for all languages. In contrast, our proposed method designs a two-stream model structure that contains two encoders and two decoders for pronunciation and prosody modeling, respectively, considering that the pronunciation knowledge and the prosody knowledge should be shared in different ways among languages. In our experiments, our proposed method effectively improved the intelligibility and naturalness of multilingual speech synthesis comparing with the baseline meta-learning synthesis method.
引用
收藏
页码:4257 / 4261
页数:5
相关论文
共 50 条
  • [41] Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
    Evrard, Marc
    Delalez, Samuel
    d'Alessandro, Christophe
    Rilliard, Albert
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3370 - 3374
  • [42] Modeling stylized invariance and local variability of prosody in text-to-speech synthesis
    Chu, Min
    Zhao, Yong
    Chang, Eric
    SPEECH COMMUNICATION, 2006, 48 (06) : 716 - 726
  • [43] Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
    Jiang, Yuepeng
    Li, Tao
    Yang, Fengyu
    Xie, Lei
    Menge, Meng
    Wang, Yujun
    INTERSPEECH 2024, 2024, : 2300 - 2304
  • [44] A Meta-Learning-based Trajectory Tracking Framework for UAVs under Degraded Conditions
    Yel, Esen
    Bezzo, Nicola
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 6884 - 6890
  • [45] DDPG with Meta-Learning-Based Experience Replay Separation for Robot Trajectory Planning
    Yang, Jin
    Peng, Gang
    2021 7TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2021, : 46 - 51
  • [46] Meta-Learning-Based Proactive Online Planning for UAVs Under Degraded Conditions
    Yel, Esen
    Gao, Shijie
    Bezzo, Nicola
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10320 - 10327
  • [47] Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis
    Prahallad, Kishore
    Black, Alan W.
    Mosur, Ravishankhar
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 853 - 856
  • [48] Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition
    Zhou, Rui
    Koshikawa, Takaki
    Ito, Akinori
    Nose, Takashi
    Chen, Chia-Ping
    IEEE ACCESS, 2024, 12 : 158493 - 158504
  • [49] A Meta-Learning-Based Approach for Hand Gesture Recognition Using FMCW Radar
    Fan, Zhongyu
    Zheng, Haifeng
    Feng, Xinxin
    2020 12TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2020, : 522 - 527
  • [50] Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection
    Awal, Md Rabiul
    Lee, Roy Ka-Wei
    Tanwar, Eshaan
    Garg, Tanmay
    Chakraborty, Tanmoy
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (01) : 1086 - 1095