Incorporating Global Variance in the Training Phase of GMM-based Voice Conversion

被引:0
|
作者
Hwang, Hsin-Te [1 ,3 ]
Tsao, Yu [2 ]
Wang, Hsin-Min [3 ]
Wang, Yih-Ru [1 ]
Chen, Sin-Horng [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
[2] Acad Sinica, Res Ctr Infomrat Technol Innovat, Taipei, Taiwan
[3] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Maximum likelihood-based trajectory mapping considering global variance (MLGV-based trajectory mapping) has been proposed for improving the quality of the converted speech of Gaussian mixture model-based voice conversion (GMM-based VC). Although the quality of the converted speech is significantly improved, the computational cost of the online conversion process is also increased because there is no closed form solution for parameter generation in MLGV-based trajectory mapping, and an iterative process is generally required. To reduce the online computational cost, we propose to incorporate GV in the training phase of GMM-based VC. Then, the conversion process can simply adopt ML-based trajectory mapping (without considering GV in the conversion phase), which has a closed form solution. In this way, it is expected that the quality of the converted speech can be improved without increasing the online computational cost. Our experimental results demonstrate that the proposed method yields a significant improvement in the quality of the converted speech comparing to the conventional GMM-based VC method. Meanwhile, comparing to MLGV-based trajectory mapping, the proposed method provides comparable converted speech quality with reduced computational cost in the conversion process.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Alleviating the Over-Smoothing Problem in GMM-Based Voice Conversion with Discriminative Training
    Hwang, Hsin-Te
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3061 - 3065
  • [2] MODULATION SPECTRUM-CONSTRAINED TRAJECTORY TRAINING ALGORITHM FOR GMM-BASED VOICE CONVERSION
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4859 - 4863
  • [3] GMM-Based Speaker Gender and Age Classification After Voice Conversion
    Pribil, Jiri
    Pribilova, Anna
    Matousek, Jindrich
    2016 FIRST INTERNATIONAL WORKSHOP ON SENSING, PROCESSING AND LEARNING FOR INTELLIGENT MACHINES (SPLINE), 2016,
  • [4] Enhancing a Glossectomy Patient's Speech via GMM-based Voice Conversion
    Tanaka, Kei
    Hara, Sunao
    Abe, Masanobu
    Minagi, Shogo
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [5] Voice Conversion Using Bilinear Model Integrated with Joint GMM-based Classification
    Sun, Xinjian
    Zhang, Xiongwei
    Yang, Jibin
    Cao, Tieyong
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 1225 - 1228
  • [6] Modulation Spectrum-Based Post-Filter for GMM-Based Voice Conversion
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [7] Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech
    Nakamura, Keigo
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    SPEECH COMMUNICATION, 2012, 54 (01) : 134 - 146
  • [8] EXPLORING MUTUAL INFORMATION FOR GMM-BASED SPECTRAL CONVERSION
    Hwang, Hsin-Te
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 50 - 54
  • [9] A Study of Mutual Information for GMM-Based Spectral Conversion
    Hwang, Hsin-Te
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 78 - 81
  • [10] Voice conversion based on trajectory model training of neural networks considering global variance
    Hosaka, Naoki
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 307 - 311