Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

被引：0

作者：

Skerry-Ryan, R. J. ^{[1
]}

Battenberg, Eric ^{[1
]}

Xiao, Ying ^{[1
]}

Wang, Yuxuan ^{[1
]}

Stanton, Daisy ^{[1
]}

Shor, Joel ^{[1
]}

Weiss, Ron J. ^{[1
]}

Clark, Rob ^{[1
]}

Saurous, Rif A. ^{[1
]}

机构：

[1] Google Inc, Mountain View, CA 94043 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody. We show that conditioning Tacotron on this learned embedding space results in synthesized audio that matches the prosody of the reference signal with fine time detail even when the reference and synthesis speakers are different. Additionally, we show that a reference prosody embedding can be used to synthesize text that is different from that of the reference utterance. We define several quantitative and subjective metrics for evaluating prosody transfer, and report results with accompanying audio samples from single-speaker and 44-speaker Tacotron models on a prosody transfer task.

引用

页数：10

共 50 条

[41] Analysis of Pronunciation Learning in End-to-End Speech Synthesis
Taylor, Jason
Richmond, Korin
INTERSPEECH 2019, 2019, : 2070 - 2074
[42] Acoustic Word Embeddings for End-to-End Speech Synthesis
Shen, Feiyu
Du, Chenpeng
Yu, Kai
APPLIED SCIENCES-BASEL, 2021, 11 (19):
[43] End-to-end Speech Synthesis for Tibetan Lhasa Dialect
Luo, Lisai
Li, Guanyu
Gong, Chunwei
Ding, Hailan
2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
[44] End-to-End Speech Synthesis for Bangla with Text Normalization
Pial, Tanzir Islam
Aunti, Shahreen Salim
Ahmed, Shabbir
Heickal, Hasnain
2018 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE/ INTELLIGENCE AND APPLIED INFORMATICS (CSII 2018), 2018, : 66 - 71
[45] Investigation of Transfer Learning for End-to-End Russian Speech Recognition
Kipyatkova, Irina
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 349 - 357
[46] Emphatic Speech Synthesis and Control Based on Characteristic Transferring in End-to-End Speech Synthesis
Wang, Mu
Wu, Zhiyong
Wu, Xixin
Meng, Helen
Kang, Shiyin
Jia, Jia
Cai, Lianhong
2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
[47] Towards End-to-End Raw Audio Music Synthesis
Eppe, Manfred
Alpay, Tayfun
Wermter, Stefan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 137 - 146
[48] Towards Paralinguistic-Only Speech Representations for End-to-End Speech Emotion Recognition
Ioannides, Georgios
Owen, Michael
Fletcher, Andrew
Rozgic, Viktor
Wang, Chao
INTERSPEECH 2023, 2023, : 1853 - 1857
[49] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Zhang, Ying
Pezeshki, Mohammad
Brakel, Philemon
Zhang, Saizheng
Laurent, Cesar
Bengio, Yoshua
Courville, Aaron
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
[50] Exploring end-to-end framework towards Khasi speech recognition system
Bronson Syiem
L. Joyprakash Singh
International Journal of Speech Technology, 2021, 24 : 419 - 424

← 1 2 3 4 5 →