VOICE CONVERSION BASED ON MATRIX VARIATE GAUSSIAN MIXTURE MODEL

被引:0
|
作者
Saito, Daisuke [1 ]
Doi, Hidenobu [1 ]
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [1 ]
机构
[1] Univ Tokyo, Tokyo, Japan
关键词
Voice conversion; Gaussian mixture model; matrix variate distribution; matrix variate normal; matrix variate Gaussian mixture model; SPEECH RECOGNITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a novel approach to construct a mapping function between a given speaker pair using probability density functions (PDF) of matrix variate. In voice conversion studies, two important functions should be realized: 1) precise modeling of both the source and target feature spaces, and 2) construction of a proper transform function between these spaces. Voice conversion based on Gaussian mixture model (GMM) is widely used because of their flexibility and easiness in handling. In GMM-based approaches, a joint vector space of the source and target is first constructed, and the joint PDF of the two vectors is modeled as GMM in the joint vector space. The joint vector approach mainly focuses on precise modeling of the 'joint' feature space, and does not always construct a proper transform between two feature spaces. In contrast, the proposed method constructs the joint PDF as GMM in a matrix variate space whose row and column respectively correspond to the two functions, and it has potential to precisely model both the characteristics of the feature spaces and the relation between the source and target spaces. Experimental results show that the proposed method contributes to improve the performance of voice conversion.
引用
收藏
页码:567 / 571
页数:5
相关论文
共 50 条
  • [11] Voice conversion using structured Gaussian mixture model in cepstrum eigenspace
    LI Yangchun
    YU Yibiao
    ChineseJournalofAcoustics, 2015, 34 (03) : 325 - 336
  • [12] Voice Conversion Using Gaussian Mixture Models
    D'souza, Kevin
    Talele, K. T. V.
    2015 INTERNATIONAL CONFERENCE ON COMMUNICATION, INFORMATION & COMPUTING TECHNOLOGY (ICCICT), 2015,
  • [13] Eigenvoice Conversion Based on Gaussian Mixture Model
    Toda, Tomoki
    Ohtani, Yamato
    Shikano, Kiyohiro
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2446 - 2449
  • [14] Matrix Variate RBM Model with Gaussian Distributions
    Liu, Simeng
    Sun, Yanfeng
    Hu, Yongli
    Gao, Junbin
    Ju, Fujiao
    Yin, Baocai
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 808 - 815
  • [15] Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model
    Nguyen, Binh Phu
    Akagi, Masato
    2008 SECOND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, 2008, : 222 - 227
  • [16] Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum
    Toda, T
    Saruwatari, H
    Shikano, K
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 841 - 844
  • [17] A Revisit to Feature Handling for High-quality Voice Conversion Based on Gaussian Mixture Model
    Suda, Hitoshi
    Kotani, Gaku
    Takamichi, Shinnosuke
    Saito, Daisuke
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 816 - 822
  • [18] Matrix Variate Gaussian Mixture Distribution Steered Robust Metric Learning
    Luo, Lei
    Huang, Heng
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3722 - 3729
  • [19] ON USING NON-LINEAR CANONICAL CORRELATION ANALYSIS FOR VOICE CONVERSION BASED ON GAUSSIAN MIXTURE MODEL
    Jian Zhihua Yang Zhen(School of Communication Engineering
    Journal of Electronics(China), 2010, 27 (01) : 1 - 7
  • [20] Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
    Doi, Hironori
    Nakamura, Keigo
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2472 - 2482