Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

被引:0
|
作者
Skerry-Ryan, R. J. [1 ]
Battenberg, Eric [1 ]
Xiao, Ying [1 ]
Wang, Yuxuan [1 ]
Stanton, Daisy [1 ]
Shor, Joel [1 ]
Weiss, Ron J. [1 ]
Clark, Rob [1 ]
Saurous, Rif A. [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody. We show that conditioning Tacotron on this learned embedding space results in synthesized audio that matches the prosody of the reference signal with fine time detail even when the reference and synthesis speakers are different. Additionally, we show that a reference prosody embedding can be used to synthesize text that is different from that of the reference utterance. We define several quantitative and subjective metrics for evaluating prosody transfer, and report results with accompanying audio samples from single-speaker and 44-speaker Tacotron models on a prosody transfer task.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Analysis of Pronunciation Learning in End-to-End Speech Synthesis
    Taylor, Jason
    Richmond, Korin
    INTERSPEECH 2019, 2019, : 2070 - 2074
  • [42] Acoustic Word Embeddings for End-to-End Speech Synthesis
    Shen, Feiyu
    Du, Chenpeng
    Yu, Kai
    APPLIED SCIENCES-BASEL, 2021, 11 (19):
  • [43] End-to-end Speech Synthesis for Tibetan Lhasa Dialect
    Luo, Lisai
    Li, Guanyu
    Gong, Chunwei
    Ding, Hailan
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [44] End-to-End Speech Synthesis for Bangla with Text Normalization
    Pial, Tanzir Islam
    Aunti, Shahreen Salim
    Ahmed, Shabbir
    Heickal, Hasnain
    2018 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE/ INTELLIGENCE AND APPLIED INFORMATICS (CSII 2018), 2018, : 66 - 71
  • [45] Investigation of Transfer Learning for End-to-End Russian Speech Recognition
    Kipyatkova, Irina
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 349 - 357
  • [46] Emphatic Speech Synthesis and Control Based on Characteristic Transferring in End-to-End Speech Synthesis
    Wang, Mu
    Wu, Zhiyong
    Wu, Xixin
    Meng, Helen
    Kang, Shiyin
    Jia, Jia
    Cai, Lianhong
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [47] Towards End-to-End Raw Audio Music Synthesis
    Eppe, Manfred
    Alpay, Tayfun
    Wermter, Stefan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 137 - 146
  • [48] Towards Paralinguistic-Only Speech Representations for End-to-End Speech Emotion Recognition
    Ioannides, Georgios
    Owen, Michael
    Fletcher, Andrew
    Rozgic, Viktor
    Wang, Chao
    INTERSPEECH 2023, 2023, : 1853 - 1857
  • [49] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
    Zhang, Ying
    Pezeshki, Mohammad
    Brakel, Philemon
    Zhang, Saizheng
    Laurent, Cesar
    Bengio, Yoshua
    Courville, Aaron
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
  • [50] Exploring end-to-end framework towards Khasi speech recognition system
    Bronson Syiem
    L. Joyprakash Singh
    International Journal of Speech Technology, 2021, 24 : 419 - 424