Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer

被引:0
|
作者
Swiatkowski, Jakub [1 ]
Wang, Duo [1 ]
Babianski, Mikolaj [1 ]
Coccia, Giuseppe [1 ]
Tobing, Patrick Lumban [1 ]
Vipperla, Ravichander [1 ]
Klimkov, Viacheslav [1 ]
Pollet, Vincent [1 ]
机构
[1] Amazon Sci, Seattle, WA 98109 USA
来源
关键词
speech synthesis; cross-lingual; prosody transfer; multi-lingual; end-to-end; machine dubbing;
D O I
10.21437/Interspeech.2023-441
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech generation for machine dubbing adds complexity to conventional Text-To-Speech solutions as the generated output is required to match the expressiveness, emotion and speaking rate of the source content. Capturing and transferring details and variations in prosody is a challenge. We introduce phrase-level cross-lingual prosody transfer for expressive multi-lingual machine dubbing. The proposed phrase-level prosody transfer delivers a significant 6.2% MUSHRA score increase over a baseline with utterance-level global prosody transfer, thereby closing the gap between the baseline and expressive human dubbing by 23.2%, while preserving intelligibility of the synthesised speech.
引用
收藏
页码:5546 / 5550
页数:5
相关论文
共 50 条
  • [1] Cross-lingual Prosody Transfer for Expressive Machine Dubbing
    Swiatkowski, Jakub
    Wang, Duo
    Babianski, Mikolaj
    Tobing, Patrick Lumban
    Vipperla, Ravichander
    Pollet, Vincent
    INTERSPEECH 2023, 2023, : 4838 - 4842
  • [2] Cross-Lingual Phrase Retrieval
    Zheng, Heqi
    Zhang, Xiao
    Chi, Zewen
    Huang, Heyan
    Yan, Tan
    Lan, Tian
    Wei, Wei
    Mao, Xian-Ling
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4193 - 4204
  • [3] Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer
    Secujski, Milan
    Gerazov, Branislav
    Csapo, Tamas Gabor
    Delic, Vlado
    Garner, Philip N.
    Gjoreski, Aleksandar
    Guennec, David
    Ivanovski, Zoran
    Melov, Aleksandar
    Nemeth, Geza
    Stojkovic, Ana
    Szaszak, Gyoergy
    SPEECH AND COMPUTER, 2016, 9811 : 199 - 206
  • [4] Adversarial and Sequential Training for Cross-lingual Prosody Transfer TTS
    Kim, Min-Kyung
    Chang, Joon-Hyuk
    INTERSPEECH 2022, 2022, : 4556 - 4560
  • [5] Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model
    Lee, Hoyeon
    Yoon, Hyun-Wook
    Kim, Jong-Hwan
    Kim, Jae-Min
    INTERSPEECH 2023, 2023, : 611 - 615
  • [6] A Typological Outlier Explained: Stress and Phrase-Level Prosody in Plains Cree
    Schmirler, Katherine
    Arnhold, Anja
    INTERNATIONAL JOURNAL OF AMERICAN LINGUISTICS, 2025, 91 (01) : 97 - 147
  • [7] Unsupervised Cross-Lingual Mapping for Phrase Embedding Spaces
    Ayana, Abraham G.
    Cao, Hailong
    Zhao, Tiejun
    ADVANCES IN INFORMATION AND COMMUNICATION, VOL 2, 2020, 1130 : 512 - 524
  • [8] Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing
    Li, Jingbei
    Li, Sipan
    Chen, Ping
    Zhang, Luwen
    Meng, Yi
    Wu, Zhiyong
    Meng, Helen
    Tian, Qiao
    Wang, Yuping
    Wang, Yuxuan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 517 - 528
  • [9] Exploring Cross-Lingual Transfer Learning with Unsupervised Machine Translation
    Wang, Chao
    Gaspers, Judith
    Do, Quynh
    Jiang, Hui
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2011 - 2020
  • [10] Machine-Created Universal Language for Cross-Lingual Transfer
    Liang, Yaobo
    Zhu, Quanzhi
    Zhao, Junhe
    Duan, Nan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18617 - 18625