Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis

被引：1

作者：

Peng, Yukun ^{[1
]}

Ling, Zhenhua ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

基金：

国家重点研发计划;

关键词：

text-to-speech; speech synthesis; multilingual; meta-learning;

D O I：

10.21437/Interspeech.2022-831

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a method of decoupled pronunciation and prosody modeling to improve the performance of meta-learning-based multilingual speech synthesis. The baseline meta-learning synthesis method adopts a single text encoder with a parameter generator conditioned on language embeddings and a single decoder to predict mel-spectrograms for all languages. In contrast, our proposed method designs a two-stream model structure that contains two encoders and two decoders for pronunciation and prosody modeling, respectively, considering that the pronunciation knowledge and the prosody knowledge should be shared in different ways among languages. In our experiments, our proposed method effectively improved the intelligibility and naturalness of multilingual speech synthesis comparing with the baseline meta-learning synthesis method.

引用

页码：4257 / 4261

页数：5

共 50 条

[31] Study of prosody model on Chinese speech synthesis based on the classification of syllabic prosody features
Tao, Jianhua
Cai, Lianhong
Shengxue Xuebao/Acta Acustica, 2003, 28 (05): : 395 - 402
[32] Ensemble Meta-Learning-Based Robust Chipping Prediction for Wafer Dicing
Chang, Bao Rong
Tsai, Hsiu-Fen
Mo, Hsiang-Yu
ELECTRONICS, 2024, 13 (10)
[33] TOWARDS LIFELONG LEARNING OF MULTILINGUAL TEXT-TO-SPEECH SYNTHESIS
Yang, Mu
Ding, Shaojin
Chen, Tianlong
Wang, Tong
Wang, Zhangyang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8022 - 8026
[34] Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis
Vainio, Martti
STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 37 - 54
[35] Meta-Learning-Based Incremental Few-Shot Object Detection
Department of Computer Science and Technology, Tongji University, Shanghai
201804, China
不详
200092, China
不详
201210, China
IEEE Trans Circuits Syst Video Technol, 2022, 4 (2158-2169):
[36] A Meta-Learning-Based Approach for Automatic First-Arrival Picking
Li, Hanyang
Sun, Yuhang
Li, Jiahui
Li, Hang
Dong, Hongli
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[37] INTEGRATED PRONUNCIATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION USING PROBABILISTIC LEXICAL MODELING
Rasipuram, Ramya
Razavi, Marzieh
Magimai-Doss, Mathew
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5176 - 5180
[38] Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis
Hsia, Chi-Chun
Wu, Chung-Hsien
Wu, Jung-Yun
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 1994 - 2003
[39] Heart Disease Diagnostics Using Meta-Learning-Based Hybrid Feature Selection
Dissanayake, Kaushalya
Johar, Md Gapar Md
APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2024, 2024
[40] Meta-learning-based estimation of the barrier layer thickness in the tropical Indian Ocean
Qi, Jifeng
Qu, Tangdong
Yin, Baoshu
ENVIRONMENTAL RESEARCH COMMUNICATIONS, 2023, 5 (09):

← 1 2 3 4 5 →