Incorporating Global Variance in the Training Phase of GMM-based Voice Conversion

被引:0
|
作者
Hwang, Hsin-Te [1 ,3 ]
Tsao, Yu [2 ]
Wang, Hsin-Min [3 ]
Wang, Yih-Ru [1 ]
Chen, Sin-Horng [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
[2] Acad Sinica, Res Ctr Infomrat Technol Innovat, Taipei, Taiwan
[3] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Maximum likelihood-based trajectory mapping considering global variance (MLGV-based trajectory mapping) has been proposed for improving the quality of the converted speech of Gaussian mixture model-based voice conversion (GMM-based VC). Although the quality of the converted speech is significantly improved, the computational cost of the online conversion process is also increased because there is no closed form solution for parameter generation in MLGV-based trajectory mapping, and an iterative process is generally required. To reduce the online computational cost, we propose to incorporate GV in the training phase of GMM-based VC. Then, the conversion process can simply adopt ML-based trajectory mapping (without considering GV in the conversion phase), which has a closed form solution. In this way, it is expected that the quality of the converted speech can be improved without increasing the online computational cost. Our experimental results demonstrate that the proposed method yields a significant improvement in the quality of the converted speech comparing to the conventional GMM-based VC method. Meanwhile, comparing to MLGV-based trajectory mapping, the proposed method provides comparable converted speech quality with reduced computational cost in the conversion process.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] GMM-based GNSS spoofing detector using double differential phase measurement
    Vinh, La The
    Nguyen, Van Hien
    Van, Hiep Hoang
    Dinh, Thuan Nguyen
    Hung, Pham Ngoc
    Ta, Tung Hai
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (04)
  • [32] Design and Implementation of Voice Conversion System Based on GMM and ANN
    Yang, Man
    Que, Dashun
    Li, Bei
    MULTIMEDIA AND SIGNAL PROCESSING, 2012, 346 : 624 - 631
  • [33] Frame Correlation Based Autoregressive GMM Method for Voice Conversion
    Li, Xian
    Wang, Zeng-fu
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 221 - 225
  • [34] A Multi-level GMM-Based Cross-Lingual Voice Conversion Using Language-Specific Mixture Weights for Polyglot Synthesis
    B. Ramani
    M. P. Actlin Jeeva
    P. Vijayalakshmi
    T. Nagarajan
    Circuits, Systems, and Signal Processing, 2016, 35 : 1283 - 1311
  • [35] Maximum Likelihood Voice Conversion Based on GMM with STRAIGHT Mixed Excitation
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2266 - 2269
  • [36] Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM
    Zahariev, Vadim
    Azarov, Elias
    Petrovsky, Alexander
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 788 - 798
  • [37] Improving Segmental GMM Based Voice Conversion Method with Target Frame Selection
    Gu, Hung-Yan
    Tsai, Sung-Fung
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 483 - 487
  • [38] Comprehensive Voice Conversion Analysis Based on D_GMM and Feature Combination
    Pan, He
    Wei, Yangjie
    Guan, Nan
    Wang, Yi
    ASIA MODELLING SYMPOSIUM 2014 (AMS 2014), 2014, : 159 - 164
  • [39] Adaptive Training for Voice Conversion Based on Eigenvoices
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (06): : 1589 - 1598
  • [40] Voice conversion based on joint pitch and spectral transformation with component - Group-GMM
    Ma, JC
    Liu, WJ
    PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 199 - 203