ACCENT CONVERSION USING PHONETIC POSTERIORGRAMS

被引:0
|
作者
Zhao, Guanlong [1 ]
Sonsaat, Sinem [2 ]
Levis, John [2 ]
Chukharev-Hudilainen, Evgeny [2 ]
Gutierrez-Osuna, Ricardo [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77843 USA
[2] Iowa State Univ, Dept English, Ames, IA USA
基金
美国国家科学基金会;
关键词
speech synthesis; accent conversion; frame pairing; posteriorgram; acoustic model; VOICE CONVERSION; FOREIGN ACCENT; SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accent conversion (AC) aims to transform non-native speech to sound as if the speaker had a native accent. This can be achieved by mapping source spectra from a native speaker into the acoustic space of the non-native speaker. In prior work, we proposed an AC approach that matches frames between the two speakers based on their acoustic similarity after compensating for differences in vocal tract length. In this paper, we propose an approach that matches frames between the two speakers based on their phonetic (rather than acoustic) similarity. Namely, we map frames from the two speakers into a phonetic posteriorgram using speaker-independent acoustic models trained on native speech. We evaluate the proposed algorithm on a corpus containing multiple native and non-native speakers. Compared to the previous AC algorithm, the proposed algorithm improves the ratings of acoustic quality (20% increase in mean opinion score) and native accent (69% preference) while retaining the voice identity of the non-native speaker.
引用
收藏
页码:5314 / 5318
页数:5
相关论文
共 50 条
  • [31] A Phonetic Investigation of Intonational Foreign Accent in Mandarin Chinese Learners of German
    Ding, Hongwei
    Jokisch, Oliver
    Hoffmann, Ruediger
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 374 - 377
  • [32] Phonetic parameters and perceptual judgments of accent in English by American and Japanese listeners
    Riney, Timothy J.
    Takagi, Naoyuki
    Inutsuka, Kumiko
    TESOL QUARTERLY, 2005, 39 (03) : 441 - 466
  • [33] Perceptual adaptation to a novel accent: Phonetic category expansion or category shift?a)
    Melguy, Yevgeniy Vasilyevich
    Johnson, Keith
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 152 (04): : 2090 - 2104
  • [34] The effect of sentential context on phonetic categorization is modulated by talker accent and exposure
    Schertz, Jessamyn (jessamyn.schertz@utoronto.ca), 2018, Acoustical Society of America (143):
  • [35] On the target of phonetic convergence: Acoustic and linguistic aspects of pitch accent imitation
    Nielsen, Kuniko
    Scarborough, Rebecca
    JOURNAL OF PHONETICS, 2024, 107
  • [36] The effect of sentential context on phonetic categorization is modulated by talker accent and exposure
    Schertz, Jessamyn
    Hawthorne, Kara
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (03): : EL231 - EL236
  • [37] THE PROBLEM OF PHONETIC AMBIGUITIES AND THEIR AUTOMATIC CONVERSION
    ENRIQUEZ, E
    BOLETIN DE LA REAL ACADEMIA ESPANOLA, 1991, 71 (252): : 157 - 183
  • [38] Event Selection from Phone Posteriorgrams Using Matched Filters
    Kintzley, Keith
    Jansen, Aren
    Hermansky, Hynek
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1916 - 1919
  • [39] Recent Advancement in Accent Conversion Using Deep Learning Techniques: A Comprehensive Review
    Chandra, Sabyasachi
    Bharati, Puja
    Prasad, G. Satya
    Pramanik, Debolina
    Das Mandal, Shyamal Kumar
    PROCEEDINGS OF 27TH INTERNATIONAL SYMPOSIUM ON FRONTIERS OF RESEARCH IN SPEECH AND MUSIC, FRSM 2023, 2024, 1455 : 61 - 73
  • [40] Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network
    Jia, Dongya
    Tian, Qiao
    Peng, Kainan
    Li, Jiaxin
    Chen, Yuanzhe
    Ma, Mingbo
    Wang, Yuping
    Wang, Yuxuan
    INTERSPEECH 2023, 2023, : 5476 - 5480