ACCENT CONVERSION USING PHONETIC POSTERIORGRAMS

被引:0
|
作者
Zhao, Guanlong [1 ]
Sonsaat, Sinem [2 ]
Levis, John [2 ]
Chukharev-Hudilainen, Evgeny [2 ]
Gutierrez-Osuna, Ricardo [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77843 USA
[2] Iowa State Univ, Dept English, Ames, IA USA
基金
美国国家科学基金会;
关键词
speech synthesis; accent conversion; frame pairing; posteriorgram; acoustic model; VOICE CONVERSION; FOREIGN ACCENT; SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accent conversion (AC) aims to transform non-native speech to sound as if the speaker had a native accent. This can be achieved by mapping source spectra from a native speaker into the acoustic space of the non-native speaker. In prior work, we proposed an AC approach that matches frames between the two speakers based on their acoustic similarity after compensating for differences in vocal tract length. In this paper, we propose an approach that matches frames between the two speakers based on their phonetic (rather than acoustic) similarity. Namely, we map frames from the two speakers into a phonetic posteriorgram using speaker-independent acoustic models trained on native speech. We evaluate the proposed algorithm on a corpus containing multiple native and non-native speakers. Compared to the previous AC algorithm, the proposed algorithm improves the ratings of acoustic quality (20% increase in mean opinion score) and native accent (69% preference) while retaining the voice identity of the non-native speaker.
引用
收藏
页码:5314 / 5318
页数:5
相关论文
共 50 条
  • [1] Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams
    Zhao, Guanlong
    Ding, Shaojin
    Gutierrez-Osuna, Ricardo
    INTERSPEECH 2019, 2019, : 2843 - 2847
  • [2] A COMPACT FRAMEWORK FOR VOICE CONVERSION USING WAVENET CONDITIONED ON PHONETIC POSTERIORGRAMS
    Lu, Hui
    Wu, Zhiyong
    Li, Runnan
    Kang, Shiyin
    Jia, Jia
    Meng, Helen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6810 - 6814
  • [3] Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion
    Zhao, Guanlong
    Gutierrez-Osuna, Ricardo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (10) : 1649 - 1660
  • [4] One-shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams
    Mohammadi, Seyed Hamidreza
    Kim, Taehwan
    INTERSPEECH 2019, 2019, : 704 - 708
  • [5] Personalized, Cross-lingual TTS Using Phonetic Posteriorgrams
    Sun, Lifa
    Wang, Hao
    Kang, Shiyin
    Li, Kun
    Meng, Helen
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 322 - 326
  • [6] NON-PARALLEL VOICE CONVERSION USING VARIATIONAL AUTOENCODERS CONDITIONED BY PHONETIC POSTERIORGRAMS AND D-VECTORS
    Saito, Yuki
    Ijima, Yusuke
    Nishida, Kyosuke
    Takamichi, Shinnosuke
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5274 - 5278
  • [7] PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAINING
    Sun, Lifa
    Li, Kun
    Wang, Hao
    Kang, Shiyin
    Meng, Helen
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [8] Jointly Trained Conversion Model and WaveNet Vocoder for Non-parallel Voice Conversion using Mel-spectrograms and Phonetic Posteriorgrams
    Liu, Songxiang
    Cao, Yuewen
    Wu, Xixin
    Sun, Lifa
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2019, 2019, : 714 - 718
  • [9] Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams
    Saito, Yuki
    Akuzawa, Kei
    Tachibana, Kentaro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (09) : 1978 - 1987
  • [10] HIGH-FIDELITY NEURAL PHONETIC POSTERIORGRAMS
    Churchwell, Cameron
    Morrison, Max
    Pardo, Bryan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 823 - 827