MODEL-MAPPING BASED VOICE CONVERSION SYSTEM A Novel Approach to Improve Voice Similarity and Naturalness using Model-based Speech Synthesis Techniques

被引：0

作者：

Li, Baojie ^{[1
]}

Wu, Dalei ^{[1
]}

Jiang, Hui ^{[1
]}

机构：

[1] York Univ, Dept Comp Sci & Engn, 4700 Keele St, Toronto, ON M3J 1P3, Canada

来源：

BIOSIGNALS 2010: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING | 2010年

关键词：

Voice conversion; HMM-based speech synthesis; GMM; Model mapping;

D O I：

暂无

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

In this paper we present a novel voice conversion application in which no any knowledge of source speakers is available, but only sufficient utterances from a target speaker and a number of other speakers are in hand. Our approach consists in two separate stages. At the training stage, we estimate a speaker dependent (SD) Gaussian mixture model (GMM) for the target speaker and additionally, we also estimate a speaker independent (SI) GMM by using the data from a number of speakers other than the source speaker. A mapping correlation between the SD and the SI model is maintained during the training process in terms of each phone label. At the conversion stage, we use the SI GMM to recognize each input frame and find the closest Gaussian mixture for it. Next, according to a mapping list, the counterpart Gaussian of the SD GMM is obtained and then used to generate a parameter vector for each frame vector. Finally all the generated vectors are concatenated to synthesize speech of the target speaker. By using the Proposed model-mapping approach, we can not only avoid the over-fitting problem by keeping the number of mixtures of the SI GMM to a fixed value, but also simultaneously improve voice quality in terms of similarity and naturalness by increasing the number of mixtures of the SD GMM. Experiments showed the effectiveness of this method.

引用

页码：442 / 446

页数：5

共 50 条

[21] Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis
Tachibana, Makoto
Izawa, Shinsuke
Nose, Takashi
Kobayashi, Takao
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4633 - 4636
[22] A Statistical Model-Based Voice Activity Detection Using Multiple DNNs and Noise Awareness
Hwang, Inyoung
Sim, Jaeseong
Kim, Sang-Hyeon
Song, Kwang-Sub
Chang, Joon-Hyuk
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2277 - 2281
[23] Voice Conversion Using Bilinear Model Integrated with Joint GMM-based Classification
Sun, Xinjian
Zhang, Xiongwei
Yang, Jibin
Cao, Tieyong
2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 1225 - 1228
[24] Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion
Gia-Nhu Nguyen
Trung-Nghia Phung
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
[25] Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion
Gia-Nhu Nguyen
Trung-Nghia Phung
EURASIP Journal on Audio, Speech, and Music Processing, 2017
[26] Model-based camera calibration using analysis by synthesis techniques
Eisert, P
VISION MODELING, AND VISUALIZATION 2002, PROCEEDINGS, 2002, : 307 - 314
[27] A novel voice activity detection based on phoneme recognition using statistical model
Bao, Xulei
Zhu, Jie
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,
[28] A novel voice activity detection based on phoneme recognition using statistical model
Xulei Bao
Jie Zhu
EURASIP Journal on Audio, Speech, and Music Processing, 2012
[29] A hidden semi-Markov model-based speech synthesis system
Zen, Heiga
Tokuda, Keiichi
Masuko, Takashi
Kobayasih, Takao
Kitamura, Tadashi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05): : 825 - 834
[30] A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion
Lachhab, Othman
Di Martino, Joseph
Ibn Elhaj, Elhassane
Hammouch, Ahmed
SPRINGERPLUS, 2015, 4

← 1 2 3 4 5 →