A STUDY ON COMBINING NON-PARALLEL AND PARALLEL METHODOLOGIES FOR MANDARIN-ENGLISH CROSS-LINGUAL VOICE CONVERSION

被引:0
|
作者
You, Chang Huai [1 ]
Dong, Minghui [1 ]
机构
[1] ASTAR, Inst Infocomm Res, Singapore, Singapore
关键词
non-parallel voice conversion; parallel voice conversion; generative adversarial network; text-to-speech; phonetic posterior-grams; NEURAL-NETWORKS;
D O I
10.1109/ICASSP48485.2024.10446264
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a cross-lingual voice conversion (VC) scheme leveraging non-parallel and parallel methodologies. The goal of cross-lingual VC is to transform the voice of one speaker from a language dataset into the voice of another speaker from a different language dataset. First, two non-parallel methods are separately investigated, they are CycleGAN-VC2 and phonetic posteriorGrams (PPG) VC. Second, two different parallel VC systems are developed to enhance the quality of the converted speech spectrogram, where the output speech from the non-parallel VC is used to form the parallel pair with the corresponding original speech. Focusing on Mandarin-English bilingual databases, the proposed VC scheme improves speech naturalness and speaker similarity as compared to the baseline non-parallel methods.
引用
收藏
页码:10491 / 10495
页数:5
相关论文
共 50 条
  • [21] Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [22] Non-parallel Sequence-to-Sequence Voice Conversion for Arbitrary Speakers
    Zhang, Ying
    Che, Hao
    Wang, Xiaorui
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [23] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
    Tuan Vu Ho
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 106 - 111
  • [24] Non-parallel Many-to-many Voice Conversion with PSR-StarGAN
    Li, Yanping
    Xu, Dongxiang
    Zhang, Yan
    Wang, Yang
    Chen, Binbin
    INTERSPEECH 2020, 2020, : 781 - 785
  • [25] C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content
    Heyman, Geert
    Vulic, Ivan
    Moens, Marie-Francine
    DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (05) : 1299 - 1323
  • [26] Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
    Wu, Yi-Chiao
    Tobing, Patrick Lumban
    Kobayashi, Kazuhiro
    Hayashi, Tomoki
    Toda, Tomoki
    IEEE ACCESS, 2020, 8 : 62094 - 62106
  • [27] Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning
    Hu, Jinsen
    Yu, Chunyan
    Guan, Faqian
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 125 - 132
  • [28] Mapping Frames with DNN-HMM Recognizer for Non-parallel Voice Conversion
    Dong, Minghui
    Yang, Chenyu
    Lu, Yanfeng
    Ehnes, Jochen Walter
    Huang, Dongyan
    Ming, Huaiping
    Tong, Rong
    Lee, Siu Wa
    Li, Haizhou
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 488 - 494
  • [29] Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
    Nakashika, Toru
    Takiguchi, Tetsuya
    Minami, Yasuhiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2032 - 2045
  • [30] Non-Parallel Voice Conversion System Using An Auto-Regressive Model
    Ezzine, Kadria
    Frikha, Mondher
    Di Martino, Joseph
    PROCEEDINGS OF THE 2022 5TH INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND EMERGENT TECHNOLOGIES IC_ASET'2022), 2022, : 500 - 504