A STUDY ON COMBINING NON-PARALLEL AND PARALLEL METHODOLOGIES FOR MANDARIN-ENGLISH CROSS-LINGUAL VOICE CONVERSION

被引:0
|
作者
You, Chang Huai [1 ]
Dong, Minghui [1 ]
机构
[1] ASTAR, Inst Infocomm Res, Singapore, Singapore
关键词
non-parallel voice conversion; parallel voice conversion; generative adversarial network; text-to-speech; phonetic posterior-grams; NEURAL-NETWORKS;
D O I
10.1109/ICASSP48485.2024.10446264
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a cross-lingual voice conversion (VC) scheme leveraging non-parallel and parallel methodologies. The goal of cross-lingual VC is to transform the voice of one speaker from a language dataset into the voice of another speaker from a different language dataset. First, two non-parallel methods are separately investigated, they are CycleGAN-VC2 and phonetic posteriorGrams (PPG) VC. Second, two different parallel VC systems are developed to enhance the quality of the converted speech spectrogram, where the output speech from the non-parallel VC is used to form the parallel pair with the corresponding original speech. Focusing on Mandarin-English bilingual databases, the proposed VC scheme improves speech naturalness and speaker similarity as compared to the baseline non-parallel methods.
引用
收藏
页码:10491 / 10495
页数:5
相关论文
共 50 条
  • [31] C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content
    Geert Heyman
    Ivan Vulić
    Marie-Francine Moens
    Data Mining and Knowledge Discovery, 2016, 30 : 1299 - 1323
  • [32] Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics
    Liu, Yufei
    Yu, Chengzhu
    Shuai, Wang
    Yang, Zhenchuan
    Chao, Yang
    Zhang, Weibin
    INTERSPEECH 2021, 2021, : 1369 - 1373
  • [33] Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2020, 2020, : 771 - 775
  • [34] VAW-GAN for Singing Voice Conversion with Non-parallel Training Data
    Lu, Junchen
    Zhou, Kun
    Sisman, Berrak
    Li, Haizhou
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 514 - 519
  • [35] MASKCYCLEGAN-VC: LEARNING NON-PARALLEL VOICE CONVERSION WITH FILLING IN FRAMES
    Kaneko, Takuhiro
    Kameoka, Hirokazu
    Tanaka, Kou
    Hojo, Nobukatsu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5919 - 5923
  • [36] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Li, Yanping
    Qiu, Xiangtian
    Cao, Pan
    Zhang, Yan
    Bao, Bingkun
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4632 - 4648
  • [37] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Yanping Li
    Xiangtian Qiu
    Pan Cao
    Yan Zhang
    Bingkun Bao
    Circuits, Systems, and Signal Processing, 2022, 41 : 4632 - 4648
  • [38] Measuring Chinese-English Cross-Lingual Word Similarity with HowNet and Parallel Corpus
    Xia, Yunqing
    Zhao, Taotao
    Yao, Jianmin
    Jin, Peng
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 221 - +
  • [39] C-CycleTransGAN: A Non-parallel Controllable Cross-gender Voice Conversion Model with CycleGAN and Transformer
    Fu, Changzeng
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 553 - 559
  • [40] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Takashima, Yuki
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)