CYBORG SPEECH: DEEP MULTILINGUAL SPEECH SYNTHESIS FOR GENERATING SEGMENTAL FOREIGN ACCENT WITH NATURAL PROSODY

被引：0

作者：

Henter, Gustav Eje ^{[1
]}

Lorenzo-Trueba, Jaime ^{[1
]}

Wang, Xin ^{[1
]}

Kondo, Mariko ^{[2
]}

Yamagishi, Junichi ^{[1
,3
]}

机构：

[1] Natl Inst Informat, Tokyo, Japan

[2] Waseda Univ, Tokyo, Japan

[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Multilingual speech synthesis; phonetic manipulation; foreign accent; DNN; RECURRENT NEURAL-NETWORK; ENGLISH; INTELLIGIBILITY;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm "cyborg speech" as it combines human and machine speech parameters. Segmentally accented speech is produced by interpolating specific quin-phone linguistic features towards phones from the other language that represent non-native mispronunciations. Experiments on synthetic American-English-accented Japanese speech show that subjective synthesis quality matches monolingual synthesis, that natural pitch is maintained, and that naturalistic phone substitutions generate output that is perceived as having an American foreign accent, even though only non-accented training data was used.

引用

页码：4799 / 4803

页数：5

共 50 条

[31] Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus
Chen, Yan-You
Wu, Chung-Hsien
Huang, Yi-Chin
Lin, Shih-Lun
Wang, Jhing-Fa
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1052 - 1065
[32] Multilingual Speech Synthesis for Voice Cloning
Seong, Jiwon
Lee, WooKey
Lee, Suan
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 313 - 316
[33] Multilingual text-to-speech synthesis
Black, AW
Lenzo, KA
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
[34] A Deep Dive into Multilingual Hate Speech Classification
Aluru, Sai Saketh
Mathew, Binny
Saha, Punyajoy
Mukherjee, Animesh
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V, 2021, 12461 : 423 - 439
[35] THE SYNTHEX SYSTEM - TREATMENT OF PROSODY IN SPEECH SYNTHESIS
AGGOUN, A
TSI-TECHNIQUE ET SCIENCE INFORMATIQUES, 1987, 6 (03): : 217 - 229
[36] Prosody modelling of Spanish for expressive speech synthesis
Iriondo, Ignasi
Socoro, Joan Claudi
Alias, Francesc
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 821 - +
[37] Prosody analysis and modeling for emotional speech synthesis
Jiang, DN
Zhang, W
Shen, LQ
Cai, LH
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 281 - 284
[38] Discourse Prosody and Its Application to Speech Synthesis
Hu, Na
Shao, Pengfei
Zu, Yiqing
Wang, Zuyan
Huang, Wei
Wang, Shijin
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[39] EMPHATIC SPEECH PROSODY PREDICTION WITH DEEP LSTM NETWORKS
Shechtman, Slava
Mordechay, Moran
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5119 - 5123
[40] Quantitative prosody modelling for natural speech description and generation
Hirose, K
Hirst, D
Sagisaka, Y
SPEECH COMMUNICATION, 2005, 46 (3-4) : 217 - 219

← 1 2 3 4 5 →