A STUDY ON COMBINING NON-PARALLEL AND PARALLEL METHODOLOGIES FOR MANDARIN-ENGLISH CROSS-LINGUAL VOICE CONVERSION

被引:0
|
作者
You, Chang Huai [1 ]
Dong, Minghui [1 ]
机构
[1] ASTAR, Inst Infocomm Res, Singapore, Singapore
关键词
non-parallel voice conversion; parallel voice conversion; generative adversarial network; text-to-speech; phonetic posterior-grams; NEURAL-NETWORKS;
D O I
10.1109/ICASSP48485.2024.10446264
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a cross-lingual voice conversion (VC) scheme leveraging non-parallel and parallel methodologies. The goal of cross-lingual VC is to transform the voice of one speaker from a language dataset into the voice of another speaker from a different language dataset. First, two non-parallel methods are separately investigated, they are CycleGAN-VC2 and phonetic posteriorGrams (PPG) VC. Second, two different parallel VC systems are developed to enhance the quality of the converted speech spectrogram, where the output speech from the non-parallel VC is used to form the parallel pair with the corresponding original speech. Focusing on Mandarin-English bilingual databases, the proposed VC scheme improves speech naturalness and speaker similarity as compared to the baseline non-parallel methods.
引用
收藏
页码:10491 / 10495
页数:5
相关论文
共 50 条
  • [41] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Yuki Takashima
    Toru Nakashika
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [42] Non-parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks
    Chen, Minchuan
    Hou, Weijian
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 4716 - 4720
  • [43] Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion
    Wu, Zhizheng
    Kinnunen, Tomi
    Chng, Eng Siong
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (12) : 914 - 917
  • [44] Enhanced Variational Auto-encoder for Voice Conversion Using Non-parallel Corpora
    Huang Guojie
    Jin Hui
    Yu Yibiao
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 46 - 49
  • [45] ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder
    Katneoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) : 1432 - 1443
  • [46] Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion
    Shah, Nirmesh J.
    Madhavi, Maulik C.
    Patil, Hemant A.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1968 - 1972
  • [47] SPEAKER ADAPTIVE MODEL BASED ON BOLTZMANN MACHINE FOR NON-PARALLEL TRAINING IN VOICE CONVERSION
    Nakashika, Torsi
    Minami, Yasuhiro
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5530 - 5534
  • [48] Fast Model Alignment for Structured Statistical Approach of Non-parallel Corpora Voice Conversion
    Che, Yingxia
    Yu, Yibiao
    2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 88 - 92
  • [49] Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion
    Ding, Shaojin
    Gutierrez-Osuna, Ricardo
    INTERSPEECH 2019, 2019, : 724 - 728
  • [50] Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 540 - 552