Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion

被引:0
|
作者
Amini, Jamal [1 ]
Shahrebabaki, Abdoreza Sabzi [1 ]
Shokouhi, Navid [1 ]
Sheikhzadeh, Hamid [1 ]
Raahemifa, Kaamran [2 ]
Eslami, Mehdi [1 ]
机构
[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran
[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada
关键词
Analysis/Synthesis; Feature Extraction; Voice Conversion; GMM; STRAIGHT; FREQUENCY;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Voice conversion typically employs spectral features to convert a source voice to a target voice. In this paper, we propose a simple method of fitting the STRAIGHT spectrum with Gaussian mixture (GM) models for speech analysis/synthesis and spectral modification. The mean values of the Gaussians are pre-determined based on Mel-frequency spacing. The standard deviations are also adaptively adjusted using the constant-Q principle and the spectrum amplitudes. Finally, the weights of the Gaussians are determined by sampling the log-spectrum at Mel-frequencies. The proposed analysis/synthesis method (MFLS-GM) is employed for speech analysis/synthesis and voice conversion. Subjective evaluations employing MOS and ABX demonstrate superior performance of the voice conversion using the MFLS-GM compared to systems employing MFCC features. The computation cost of the proposed analysis/synthesis method is also much lower than those based on MFCC.
引用
收藏
页码:428 / 433
页数:6
相关论文
共 50 条
  • [1] Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
    Doi, Hironori
    Nakamura, Keigo
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2472 - 2482
  • [2] On the transformation of the speech spectrum for voice conversion
    Baudoin, G
    Stylianou, Y
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1405 - 1408
  • [3] Voice Conversion for Whispered Speech Synthesis
    Cotescu, Marius
    Drugman, Thomas
    Huybrechts, Goeric
    Lorenzo-Trueba, Jaime
    Moinet, Alexis
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 186 - 190
  • [4] Age Approximation from Speech using Gaussian Mixture Models
    Mittal, Tanushri
    Barthwal, Anurag
    Koolagudi, Shashidhar G.
    2013 SECOND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND SECURITY (ADCONS 2013), 2013, : 74 - 78
  • [5] Spectral voice conversion for text-to-speech synthesis
    Kain, A
    Macon, MW
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 285 - 288
  • [6] Dimensional Affective Speech Synthesis Based on Voice Conversion
    Zhang, Xin
    Wan, Yaobin
    Wang, Wei
    Intelligent Computing, 2024, 3
  • [7] Synthesis of Child Speech With HMM Adaptation and Voice Conversion
    Watts, Oliver
    Yamagishi, Junichi
    King, Simon
    Berkling, Kay
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 1005 - 1016
  • [8] DETECTION OF STOP LANDMARKS USING GAUSSIAN MIXTURE MODELING OF SPEECH SPECTRUM
    Jayan, A. R.
    Pandey, P. C.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4681 - 4684
  • [9] SPEECH AND VOICE SYNTHESIS
    THOMAS, MR
    BYTE, 1984, 9 (13): : 301 - 301
  • [10] A Comparison of Voice Conversion Methods for Transforming Voice Quality in Emotional Speech Synthesis
    Tuerk, Oytun
    Schroeder, Marc
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2282 - 2285