VOICE CONVERSION BASED ON MATRIX VARIATE GAUSSIAN MIXTURE MODEL

被引:0
|
作者
Saito, Daisuke [1 ]
Doi, Hidenobu [1 ]
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [1 ]
机构
[1] Univ Tokyo, Tokyo, Japan
关键词
Voice conversion; Gaussian mixture model; matrix variate distribution; matrix variate normal; matrix variate Gaussian mixture model; SPEECH RECOGNITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a novel approach to construct a mapping function between a given speaker pair using probability density functions (PDF) of matrix variate. In voice conversion studies, two important functions should be realized: 1) precise modeling of both the source and target feature spaces, and 2) construction of a proper transform function between these spaces. Voice conversion based on Gaussian mixture model (GMM) is widely used because of their flexibility and easiness in handling. In GMM-based approaches, a joint vector space of the source and target is first constructed, and the joint PDF of the two vectors is modeled as GMM in the joint vector space. The joint vector approach mainly focuses on precise modeling of the 'joint' feature space, and does not always construct a proper transform between two feature spaces. In contrast, the proposed method constructs the joint PDF as GMM in a matrix variate space whose row and column respectively correspond to the two functions, and it has potential to precisely model both the characteristics of the feature spaces and the relation between the source and target spaces. Experimental results show that the proposed method contributes to improve the performance of voice conversion.
引用
收藏
页码:567 / 571
页数:5
相关论文
共 50 条
  • [21] Voice Conversion Based on Gaussian Mixture Modules with Minimum Distance Spectral Mapping
    Jin, Gui
    Johnson, Michael T.
    Liu, Jia
    Lin, Xiaokang
    2015 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2015, : 356 - 359
  • [22] Voice conversion based on Gaussian processes by using kernels modeling the spectral density with Gaussian mixture models
    Bao, Jingyi
    Xu, Ning
    MODERN PHYSICS LETTERS B, 2018, 32 (34-36):
  • [23] A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping
    Rassam, Murad A.
    Almekhlafi, Rasha
    Alosaily, Eman
    Hassan, Haneen
    Hassan, Reem
    Saeed, Eman
    Alqershi, Elham
    EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 396 - 406
  • [24] Interpretable parametric voice conversion functions based on Gaussian mixture models and constrained transformations
    Erro, Daniel
    Alonso, Agustin
    Serrano, Luis
    Navas, Eva
    Hernaez, Inma
    COMPUTER SPEECH AND LANGUAGE, 2015, 30 (01): : 3 - 15
  • [25] STORYTELLING VOICE CONVERSION: EVALUATION EXPERIMENT USING GAUSSIAN MIXTURE MODELS
    Pribil, Jiri
    Pribilova, Anna
    Durackova, Daniela
    JOURNAL OF ELECTRICAL ENGINEERING-ELEKTROTECHNICKY CASOPIS, 2015, 66 (04): : 194 - 202
  • [26] Robust voice activity detection algorithm based on complex Gaussian mixture model
    Lei, Jian-Jun
    Yang, Zhen
    Liu, Gang
    Guo, Jun
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2009, 42 (04): : 353 - 356
  • [27] VOICE CONVERSION BASED ON A MIXTURE DENSITY NETWORK
    Ahangar, Mohsen
    Ghorbandoost, Mostafa
    Sharma, Sudhendu
    Smith, Mark J. T.
    2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 329 - 333
  • [28] A Study on Bag of Gaussian Model with Application to Voice Conversion
    Qiao, Yu
    Tong, Tong
    Minematsu, Nobuaki
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 664 - +
  • [29] Voice Activity Detection Based on Sequential Gaussian Mixture Model with Maximum Likelihood Criterion
    Shen, Zhan
    Wei, Jianguo
    Lu, Wenhuan
    Dang, Jianwu
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [30] Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion
    Amini, Jamal
    Shahrebabaki, Abdoreza Sabzi
    Shokouhi, Navid
    Sheikhzadeh, Hamid
    Raahemifa, Kaamran
    Eslami, Mehdi
    2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013), 2013, : 428 - 433