Phoneme Hallucinator: One-Shot Voice Conversion via Set Expansion

被引:0
|
作者
Shan, Siyuan [1 ]
Li, Yang [1 ]
Banerjee, Amartya [1 ]
Oliva, Junier B. [1 ]
机构
[1] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC 27599 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of target speaker voice data to achieve high intelligibility. In this work, we propose a novel method Phoneme Hallucinator that achieves the best of both worlds. Phoneme Hallucinator is a one-shot VC model; it adopts a novel model to hallucinate diversified and high-fidelity target speaker phonemes based just on a short target speaker voice (e.g. 3 seconds). The hallucinated phonemes are then exploited to perform neighbor-based voice conversion. Our model is a text-free, any-to-any VC model that requires no text annotations and supports conversion to any unseen speaker. Quantitative and qualitative evaluations show that Phoneme Hallucinator outperforms existing VC methods for both intelligibility and speaker similarity.
引用
收藏
页码:14910 / 14918
页数:9
相关论文
共 50 条
  • [1] ONE-SHOT VOICE CONVERSION BY VECTOR QUANTIZATION
    Wu, Da-Yi
    Lee, Hung-yi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7734 - 7738
  • [2] One-shot Voice Conversion with Global Speaker Embeddings
    Lu, Hui
    Wu, Zhiyong
    Dai, Dongyang
    Li, Runnan
    Kang, Shiyin
    Jia, Jia
    Meng, Helen
    INTERSPEECH 2019, 2019, : 669 - 673
  • [3] ONE-SHOT VOICE CONVERSION USING STAR-GAN
    Wang, Ruobai
    Ding, Yu
    Li, Lincheng
    Fan, Changjie
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7729 - 7733
  • [4] One-shot emotional voice conversion based on feature separation
    Lu, Wenhuan
    Zhao, Xinyue
    Guo, Na
    Li, Yongwei
    Wei, Jianguo
    Tao, Jianhua
    Dang, Jianwu
    SPEECH COMMUNICATION, 2022, 143 : 1 - 9
  • [5] One-shot Voice Conversion with Speaker-agnostic StarGAN
    Eskimez, Sefik Emre
    Dimitriadis, Dimitrios
    Kumatani, Kenichi
    Gmyr, Robert
    INTERSPEECH 2021, 2021, : 1334 - 1338
  • [6] One-Shot Voice Conversion Algorithm Based on Representations Separation
    Deng, Chunhui
    Chen, Ying
    Deng, Huifang
    IEEE ACCESS, 2020, 8 : 196578 - 196586
  • [7] ONE-SHOT VOICE CONVERSION BASED ON SPEAKER AWARE MODULE
    Zhang, Ying
    Che, Hao
    Li, Jie
    Li, Chenxing
    Wang, Xiaorui
    Wang, Zhongyuan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5959 - 5963
  • [8] A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion
    Li, Xu
    Liu, Shansong
    Shan, Ying
    INTERSPEECH 2022, 2022, : 4307 - 4311
  • [9] ONE-SHOT VOICE CONVERSION FOR STYLE TRANSFER BASED ON SPEAKER ADAPTATION
    Wang, Zhichao
    Xie, Qicong
    Li, Tao
    Du, Hongqiang
    Xie, Lei
    Zhu, Pengcheng
    Bi, Mengxiao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6792 - 6796
  • [10] Attention-Based Speaker Embeddings for One-Shot Voice Conversion
    Ishihara, Tatsuma
    Saito, Daisuke
    INTERSPEECH 2020, 2020, : 806 - 810