Incorporating Global Variance in the Training Phase of GMM-based Voice Conversion

被引:0
|
作者
Hwang, Hsin-Te [1 ,3 ]
Tsao, Yu [2 ]
Wang, Hsin-Min [3 ]
Wang, Yih-Ru [1 ]
Chen, Sin-Horng [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
[2] Acad Sinica, Res Ctr Infomrat Technol Innovat, Taipei, Taiwan
[3] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Maximum likelihood-based trajectory mapping considering global variance (MLGV-based trajectory mapping) has been proposed for improving the quality of the converted speech of Gaussian mixture model-based voice conversion (GMM-based VC). Although the quality of the converted speech is significantly improved, the computational cost of the online conversion process is also increased because there is no closed form solution for parameter generation in MLGV-based trajectory mapping, and an iterative process is generally required. To reduce the online computational cost, we propose to incorporate GV in the training phase of GMM-based VC. Then, the conversion process can simply adopt ML-based trajectory mapping (without considering GV in the conversion phase), which has a closed form solution. In this way, it is expected that the quality of the converted speech can be improved without increasing the online computational cost. Our experimental results demonstrate that the proposed method yields a significant improvement in the quality of the converted speech comparing to the conventional GMM-based VC method. Meanwhile, comparing to MLGV-based trajectory mapping, the proposed method provides comparable converted speech quality with reduced computational cost in the conversion process.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] AN IMPROVED ALGORITHM OF GMM VOICE CONVERSION SYSTEM BASED ON CHANGING THE TIME-SCALE
    Zhou Ying Zhang LinghuaCollege of Telecommunications Information EngineeringNanjing University of Posts and Telecommunications Nanjing China
    Journal of Electronics(China), 2011, 28(Z1) (China) : 518 - 523
  • [42] Performance of new voice conversion systems based on GMM models and applied to Arabic language
    Guerid, A.
    Houacine, A.
    Andre-Obrecht, R.
    Lachambre, H.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (04) : 477 - 485
  • [43] AN IMPROVED ALGORITHM OF GMM VOICE CONVERSION SYSTEM BASED ON CHANGING THE TIME-SCALE
    Zhou Ying Zhang Linghua(College of Telecommunications & Information Engineering
    Journal of Electronics(China), 2011, (Z1) : 518 - 523
  • [44] ONE SENTENCE VOICE ADAPTATION USING GMM-BASED FREQUENCY-WARPING AND SHIFT WITH A SUB-BAND BASIS SPECTRUM MODEL
    Tamura, Masatsune
    Morita, Masahiro
    Kagoshima, Takehiko
    Akamine, Masami
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5124 - 5127
  • [45] Novel Inter Mixture Weighted GMM Posteriorgram for DNN and GAN-based Voice Conversion
    Shah, Nirmesh J.
    Sreeraj, R.
    Shah, Neil
    Patil, Hemant A.
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1776 - 1781
  • [46] Nonparallel training for voice conversion based on a parameter adaptation approach
    Mouchtaris, A
    Van der Spiegel, J
    Mueller, P
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (03): : 952 - 963
  • [47] Voice conversion based on feature combination with limited training data
    Ghorbandoost, Mostafa
    Sayadiyan, Abolghasem
    Ahangar, Mohsen
    Sheikhzadeh, Hamid
    Shahrebabaki, Abdoreza Sabzi
    Amini, Jamal
    SPEECH COMMUNICATION, 2015, 67 : 113 - 128
  • [48] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
    CHEN Xian-tong
    ZHANG Ling-hua
    The Journal of China Universities of Posts and Telecommunications, 2014, (05) : 68 - 75
  • [49] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
    CHEN Xian-tong
    ZHANG Ling-hua
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2014, 21 (05) : 68 - 75+93
  • [50] Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data
    Xu, Ning
    Tang, Yibing
    Bao, Jingyi
    Jiang, Aiming
    Liu, Xiaofeng
    Yang, Zhen
    SPEECH COMMUNICATION, 2014, 58 : 124 - 138