Voice Conversion Based on Unified Dictionary with Clustered Features Between Non-parallel Corpus

被引:1
|
作者
Jin, Hui [1 ]
Yu, Yi-Biao [1 ]
机构
[1] Soochow Univ, Sch Elect & Informat Engn, Suzhou 215000, Peoples R China
关键词
Voice conversion; Clustered features; Non-negative matrix factorization; Unified dictionary;
D O I
10.1109/ICNISC.2018.00052
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Non-negative matrix factorization (NMF) has been widely applied to exemplar-based voice conversion(VC) recently. It differs noise robustness and naturalness of the converted voice, compared with conventional statistical Gaussian mixture model-based VC. However, parallel training data from source and target speakers are required so it can not realize the arbitrary speakers' voice conversion, especially when the corpus of target speakers is inadequate. In this paper, we present a novel algorithm by clustering the spectral features in high dimensions to construct the unified dictionary and introduce a mapping matrix between source and target sparse coefficients. Experimental results demonstrate that the value of average cepstral distortion is 0.833 which is about 4.3% lower than the performance of conventional NMF based method. Subjective evaluations such as ABX and MOS are also discussed. It indicates that the speech quality in our study is quite better than conventional NMF. The target speaker's spectra are even unnecessary to be included in the training set.
引用
收藏
页码:229 / 232
页数:4
相关论文
共 50 条
  • [21] Non-parallel Voice Conversion using Generative Adversarial Networks
    Hasunuma, Yuta
    Hirayama, Chiaki
    Kobayashi, Masayuki
    Nagao, Tomoharu
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1635 - 1640
  • [22] StyleVC: Non-Parallel Voice Conversion with Adversarial Style Generalization
    Hwang, In-Sun
    Lee, Sang-Hoon
    Lee, Seong-Whan
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 23 - 30
  • [23] Parallel-data-free Many-to-many Voice Conversion based on DNN Integrated with Eigenspace Using a Non-parallel Speech Corpus
    Hashimoto, Tetsuya
    Uchida, Hidetsugu
    Saito, Daisuke
    Minematsu, Nobuaki
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1278 - 1282
  • [24] SPEAKER ADAPTIVE MODEL BASED ON BOLTZMANN MACHINE FOR NON-PARALLEL TRAINING IN VOICE CONVERSION
    Nakashika, Torsi
    Minami, Yasuhiro
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5530 - 5534
  • [25] Non-parallel training for voice conversion by maximum likelihood constrained adaptation
    Mouchtaris, A
    Van der Spiegel, J
    Mueller, P
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1 - 4
  • [26] MoCoVC: Non-parallel Voice Conversion with Momentum Contrastive Representation Learning
    Onishi, Kotaro
    Nakashika, Toru
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1438 - 1443
  • [27] Non-parallel Voice Conversion using Weighted Generative Adversarial Networks
    Paul, Dipjyoti
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2019, 2019, : 659 - 663
  • [28] A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
    Tian, Xiaohai
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2019, 2019, : 201 - 205
  • [29] Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [30] Non-parallel Sequence-to-Sequence Voice Conversion for Arbitrary Speakers
    Zhang, Ying
    Che, Hao
    Wang, Xiaorui
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,