Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion

被引：0

作者：

Amini, Jamal ^{[1
]}

Shahrebabaki, Abdoreza Sabzi ^{[1
]}

Shokouhi, Navid ^{[1
]}

Sheikhzadeh, Hamid ^{[1
]}

Raahemifa, Kaamran ^{[2
]}

Eslami, Mehdi ^{[1
]}

机构：

[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran

[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada

来源：

2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013) | 2013年

关键词：

Analysis/Synthesis; Feature Extraction; Voice Conversion; GMM; STRAIGHT; FREQUENCY;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Voice conversion typically employs spectral features to convert a source voice to a target voice. In this paper, we propose a simple method of fitting the STRAIGHT spectrum with Gaussian mixture (GM) models for speech analysis/synthesis and spectral modification. The mean values of the Gaussians are pre-determined based on Mel-frequency spacing. The standard deviations are also adaptively adjusted using the constant-Q principle and the spectrum amplitudes. Finally, the weights of the Gaussians are determined by sampling the log-spectrum at Mel-frequencies. The proposed analysis/synthesis method (MFLS-GM) is employed for speech analysis/synthesis and voice conversion. Subjective evaluations employing MOS and ABX demonstrate superior performance of the voice conversion using the MFLS-GM compared to systems employing MFCC features. The computation cost of the proposed analysis/synthesis method is also much lower than those based on MFCC.

引用

页码：428 / 433

页数：6

共 50 条

[1] Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
Doi, Hironori
Nakamura, Keigo
Toda, Tomoki
Saruwatari, Hiroshi
Shikano, Kiyohiro
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2472 - 2482
[2] On the transformation of the speech spectrum for voice conversion
Baudoin, G
Stylianou, Y
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1405 - 1408
[3] Voice Conversion for Whispered Speech Synthesis
Cotescu, Marius
Drugman, Thomas
Huybrechts, Goeric
Lorenzo-Trueba, Jaime
Moinet, Alexis
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 186 - 190
[4] Age Approximation from Speech using Gaussian Mixture Models
Mittal, Tanushri
Barthwal, Anurag
Koolagudi, Shashidhar G.
2013 SECOND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND SECURITY (ADCONS 2013), 2013, : 74 - 78
[5] Spectral voice conversion for text-to-speech synthesis
Kain, A
Macon, MW
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 285 - 288
[6] Dimensional Affective Speech Synthesis Based on Voice Conversion
Zhang, Xin
Wan, Yaobin
Wang, Wei
Intelligent Computing, 2024, 3
[7] Synthesis of Child Speech With HMM Adaptation and Voice Conversion
Watts, Oliver
Yamagishi, Junichi
King, Simon
Berkling, Kay
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 1005 - 1016
[8] DETECTION OF STOP LANDMARKS USING GAUSSIAN MIXTURE MODELING OF SPEECH SPECTRUM
Jayan, A. R.
Pandey, P. C.
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4681 - 4684
[9] SPEECH AND VOICE SYNTHESIS
THOMAS, MR
BYTE, 1984, 9 (13): : 301 - 301
[10] A Comparison of Voice Conversion Methods for Transforming Voice Quality in Emotional Speech Synthesis
Tuerk, Oytun
Schroeder, Marc
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2282 - 2285

← 1 2 3 4 5 →