Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

被引：0

作者：

Tu, Tao ^{[1
]}

Chen, Yuan-Jui ^{[1
]}

Liu, Alexander H. ^{[1
]}

Lee, Hung-yi ^{[1
]}

机构：

[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan

来源：

INTERSPEECH 2020 | 2020年

关键词：

multi-speaker speech synthesis; semi-supervised learning; discrete speech representation;

D O I：

10.21437/Interspeech.2020-1824

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Recently, end-to-end multi-speaker text-to-speech (TTS) systems gain success in the situation where a lot of high-quality speech plus their corresponding transcriptions are available. However, laborious paired data collection processes prevent many institutes from building multi-speaker TTS systems of great performance. In this work, we propose a semi-supervised learning approach for multi-speaker TTS. A multi-speaker TTS model can learn from the untranscribed audio via the proposed encoder-decoder framework with discrete speech representation. The experiment results demonstrate that with only an hour of paired speech data, whether the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices. We found the model can benefit from the proposed semi-supervised learning approach even when part of the unpaired speech data is noisy. In addition, our analysis reveals that different speaker characteristics of the paired data have an impact on the effectiveness of semi-supervised TTS.

引用

页码：3191 / 3195

页数：5

共 50 条

[31] Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Yoon, Hyungchan
Kim, Changhwan
Song, Eunwoo
Yoon, Hyun-Wook
Kang, Hong-Goo
INTERSPEECH 2023, 2023, : 4299 - 4303
[32] Learning Speaker Embedding from Text-to-Speech
Cho, Jaejin
Zelasko, Piotr
Villalba, Jesus
Watanabe, Shinji
Dehak, Najim
INTERSPEECH 2020, 2020, : 3256 - 3260
[33] Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech
Jeong, Myeonghun
Kim, Minchan
Choi, Byoung Jin
Yoon, Jaesam
Jang, Won
Kim, Nam Soo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1519 - 1530
[34] Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios
Xie, Qicong
Li, Tao
Wang, Xinsheng
Wang, Zhichao
Xie, Lei
Yu, Guoqiao
Wan, Guanglu
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 66 - 70
[35] ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS CONDITIONED USING SELF-SUPERVISED SPEECH REPRESENTATION MODEL
Fujita, Kenichi
Ashihara, Takanori
Kanagawa, Hiroki
Moriya, Takafumi
Ijima, Yusuke
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
[36] Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation
Mitsui, Kentaro
Koriyama, Tomoki
Saruwatari, Hiroshi
SPEECH COMMUNICATION, 2021, 132 : 132 - 145
[37] NNSPEECH: SPEAKER-GUIDED CONDITIONAL VARIATIONAL AUTOENCODER FOR ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH
Zhao, Botao
Zhang, Xulong
Wang, Jianzong
Cheng, Ning
Xiao, Jing
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4293 - 4297
[38] SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Casanova, Edresson
Shulby, Christopher
Golge, Eren
Muller, Nicolas Michael
de Oliveira, Frederico Santos
Candido Junior, Arnaldo
Soares, Anderson da Silva
Aluisio, Sandra Maria
Ponti, Moacir Antonelli
INTERSPEECH 2021, 2021, : 3645 - 3649
[39] Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis
Li, Weiqin
Lei, Shun
Huang, Qiaochu
Zhou, Yixuan
Wu, Zhiyong
Kang, Shiyin
Meng, Helen
INTERSPEECH 2023, 2023, : 3377 - 3381
[40] Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Jia, Ye
Zhang, Yu
Weiss, Ron J.
Wang, Quan
Shen, Jonathan
Ren, Fei
Chen, Zhifeng
Nguyen, Patrick
Pang, Ruoming
Moreno, Ignacio Lopez
Wu, Yonghui
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 4 5 →