Maximum Likelihood Voice Conversion Based on GMM with STRAIGHT Mixed Excitation

被引:0
|
作者
Ohtani, Yamato [1 ]
Toda, Tomoki [1 ]
Saruwatari, Hiroshi [1 ]
Shikano, Kiyohiro [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
Speech synthesis; Voice conversion; Gaussian mixture model; STRAIGHT; Mixed excitation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of voice conversion has been considerably improved through statistical modeling of spectral sequences. However, the converted speech still contains traces of artificial sounds. To alleviate this, it is necessary to statistically model a source sequence as well as a spectral sequence. In this paper, we introduce STRAIGHT mixed excitation to a framework of the voice conversion based on a Gaussian Mixture Model (GMM) on joint probability density of source and target features. We convert both spectral and source feature sequences based on Maximum Likelihood Estimation (MLE). Objective and subjective evaluation results demonstrate that the proposed source conversion produces strong improvements in both the converted speech quality and the conversion accuracy for speaker individuality.
引用
收藏
页码:2266 / 2269
页数:4
相关论文
共 50 条
  • [21] Comparing ANN and GMM in a voice conversion framework
    Laskar, R. H.
    Chakrabarty, D.
    Talukdar, F. A.
    Rao, K. Sreenivasa
    Banerjee, K.
    APPLIED SOFT COMPUTING, 2012, 12 (11) : 3332 - 3342
  • [22] Improving Segmental GMM Based Voice Conversion Method with Target Frame Selection
    Gu, Hung-Yan
    Tsai, Sung-Fung
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 483 - 487
  • [23] Incorporating Global Variance in the Training Phase of GMM-based Voice Conversion
    Hwang, Hsin-Te
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [24] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON FT-GMM
    Chen, Ling-Hui
    Ling, Zhen-Hua
    Dai, Li-Rong
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5116 - 5119
  • [25] GMM-Based Speaker Gender and Age Classification After Voice Conversion
    Pribil, Jiri
    Pribilova, Anna
    Matousek, Jindrich
    2016 FIRST INTERNATIONAL WORKSHOP ON SENSING, PROCESSING AND LEARNING FOR INTELLIGENT MACHINES (SPLINE), 2016,
  • [26] Comprehensive Voice Conversion Analysis Based on D_GMM and Feature Combination
    Pan, He
    Wei, Yangjie
    Guan, Nan
    Wang, Yi
    ASIA MODELLING SYMPOSIUM 2014 (AMS 2014), 2014, : 159 - 164
  • [27] A CODEBOOK COMPENSATIVE VOICE MORPHING ALGORITHM BASED ON MAXIMUM LIKELIHOOD ESTIMATION
    Xu Ning Yang Zhen Zhang Linhua(Institute of Signal Processing and Transmission
    JournalofElectronics(China), 2009, 26 (03) : 346 - 352
  • [28] Visual-to-Speech Conversion Based on Maximum Likelihood Estimation
    Ra, Rina
    Aihara, Ryo
    Takiguchi, Tesuya
    Ariki, Yasuo
    PROCEEDINGS OF THE FIFTEENTH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS - MVA2017, 2017, : 518 - 521
  • [29] Modulation Spectrum-Based Post-Filter for GMM-Based Voice Conversion
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [30] Robust Authentication Using Likelihood Ratio and GMM for the Fusion of Voice and Face
    Bengherabi, Messaoud
    Mezai, Lamia
    Harizi, Farid
    Guessoum, Abderrazak
    Cheriet, Mohamed
    2009 3RD INTERNATIONAL CONFERENCE ON SIGNALS, CIRCUITS AND SYSTEMS (SCS 2009), 2009, : 390 - +