Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis

被引:1
|
作者
Peng, Yukun [1 ]
Ling, Zhenhua [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China
来源
基金
国家重点研发计划;
关键词
text-to-speech; speech synthesis; multilingual; meta-learning;
D O I
10.21437/Interspeech.2022-831
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a method of decoupled pronunciation and prosody modeling to improve the performance of meta-learning-based multilingual speech synthesis. The baseline meta-learning synthesis method adopts a single text encoder with a parameter generator conditioned on language embeddings and a single decoder to predict mel-spectrograms for all languages. In contrast, our proposed method designs a two-stream model structure that contains two encoders and two decoders for pronunciation and prosody modeling, respectively, considering that the pronunciation knowledge and the prosody knowledge should be shared in different ways among languages. In our experiments, our proposed method effectively improved the intelligibility and naturalness of multilingual speech synthesis comparing with the baseline meta-learning synthesis method.
引用
收藏
页码:4257 / 4261
页数:5
相关论文
共 50 条
  • [31] Study of prosody model on Chinese speech synthesis based on the classification of syllabic prosody features
    Tao, Jianhua
    Cai, Lianhong
    Shengxue Xuebao/Acta Acustica, 2003, 28 (05): : 395 - 402
  • [32] Ensemble Meta-Learning-Based Robust Chipping Prediction for Wafer Dicing
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Mo, Hsiang-Yu
    ELECTRONICS, 2024, 13 (10)
  • [33] TOWARDS LIFELONG LEARNING OF MULTILINGUAL TEXT-TO-SPEECH SYNTHESIS
    Yang, Mu
    Ding, Shaojin
    Chen, Tianlong
    Wang, Tong
    Wang, Zhangyang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8022 - 8026
  • [34] Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis
    Vainio, Martti
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 37 - 54
  • [35] Meta-Learning-Based Incremental Few-Shot Object Detection
    Department of Computer Science and Technology, Tongji University, Shanghai
    201804, China
    不详
    200092, China
    不详
    201210, China
    IEEE Trans Circuits Syst Video Technol, 2022, 4 (2158-2169):
  • [36] A Meta-Learning-Based Approach for Automatic First-Arrival Picking
    Li, Hanyang
    Sun, Yuhang
    Li, Jiahui
    Li, Hang
    Dong, Hongli
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [37] INTEGRATED PRONUNCIATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION USING PROBABILISTIC LEXICAL MODELING
    Rasipuram, Ramya
    Razavi, Marzieh
    Magimai-Doss, Mathew
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5176 - 5180
  • [38] Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis
    Hsia, Chi-Chun
    Wu, Chung-Hsien
    Wu, Jung-Yun
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 1994 - 2003
  • [39] Heart Disease Diagnostics Using Meta-Learning-Based Hybrid Feature Selection
    Dissanayake, Kaushalya
    Johar, Md Gapar Md
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2024, 2024
  • [40] Meta-learning-based estimation of the barrier layer thickness in the tropical Indian Ocean
    Qi, Jifeng
    Qu, Tangdong
    Yin, Baoshu
    ENVIRONMENTAL RESEARCH COMMUNICATIONS, 2023, 5 (09):