CYBORG SPEECH: DEEP MULTILINGUAL SPEECH SYNTHESIS FOR GENERATING SEGMENTAL FOREIGN ACCENT WITH NATURAL PROSODY

被引:0
|
作者
Henter, Gustav Eje [1 ]
Lorenzo-Trueba, Jaime [1 ]
Wang, Xin [1 ]
Kondo, Mariko [2 ]
Yamagishi, Junichi [1 ,3 ]
机构
[1] Natl Inst Informat, Tokyo, Japan
[2] Waseda Univ, Tokyo, Japan
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Multilingual speech synthesis; phonetic manipulation; foreign accent; DNN; RECURRENT NEURAL-NETWORK; ENGLISH; INTELLIGIBILITY;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm "cyborg speech" as it combines human and machine speech parameters. Segmentally accented speech is produced by interpolating specific quin-phone linguistic features towards phones from the other language that represent non-native mispronunciations. Experiments on synthetic American-English-accented Japanese speech show that subjective synthesis quality matches monolingual synthesis, that natural pitch is maintained, and that naturalistic phone substitutions generate output that is perceived as having an American foreign accent, even though only non-accented training data was used.
引用
收藏
页码:4799 / 4803
页数:5
相关论文
共 50 条
  • [41] Natural Prosody Generation in TTS for Marathi Speech Signal
    Repe, Madhavi R.
    Shirbahadurkar, S. D.
    Desai, Smita
    2010 INTERNATIONAL CONFERENCE ON SIGNAL ACQUISITION AND PROCESSING: ICSAP 2010, PROCEEDINGS, 2010, : 358 - 361
  • [42] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
  • [43] Automatic accent classification of foreign accented Australian English speech
    Kumpf, K
    King, RW
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1740 - 1743
  • [44] A case of foreign accent syndrome (FAS) and the effects of speech therapy on it
    Khanahmadi, S.
    Daun, R.
    Pooyania, S.
    STROKE, 2011, 42 (11) : E618 - E618
  • [45] A rare neurologically originated speech disorder:: Foreign accent syndrome
    González-Alvarez, J
    Parcet-Ibars, MA
    Avila, C
    Geffner-Sclarsky, D
    REVISTA DE NEUROLOGIA, 2003, 36 (03) : 227 - 234
  • [46] Foreign accent, comprehensibility, and intelligibility in the speech of second language learners
    Munro, MJ
    Derwing, TM
    LANGUAGE LEARNING, 1999, 49 : 285 - 310
  • [47] Evaluations of foreign accent in a purist speech community The case of Iceland
    Bade, Stefanie
    LANGUAGE VARIATION - EUROPEAN PERSPECTIVES VII, 2019, 22 : 53 - 70
  • [48] Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams
    Zhao, Guanlong
    Ding, Shaojin
    Gutierrez-Osuna, Ricardo
    INTERSPEECH 2019, 2019, : 2843 - 2847
  • [49] STUDIES OF ARTICULATORY TIMING IN NORMAL AND FOREIGN ACCENT SYNDROME SPEECH
    BOATMAN, D
    GORDON, B
    STONE, M
    ANDERSON, S
    BRAIN AND LANGUAGE, 1994, 47 (03) : 549 - 552
  • [50] Deep Segmental Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Deng, Li
    Yu, Dong
    Jiang, Hui
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852