Improving the performance of MGM-based voice conversion by preparing training data method

被引：0

作者：

Zuo, GY ^{[1
]}

Liu, WJ ^{[1
]}

Ruan, XG ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China

来源：

2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS | 2004年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral features representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/UV) decision scheme for time-alignment is provided to obtain the right data for training mixture Gaussian spectral mapping function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods and V/UV decisions strategies to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.

引用

页码：181 / 184

页数：4

共 50 条

[41] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON FT-GMM
Chen, Ling-Hui
Ling, Zhen-Hua
Dai, Li-Rong
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5116 - 5119
[42] TWO-STAGE TRAINING METHOD FOR JAPANESE ELECTROLARYNGEAL SPEECH ENHANCEMENT BASED ON SEQUENCE-TO-SEQUENCE VOICE CONVERSION
Ma, Ding
Violeta, Lester Phillip
Kobayashi, Kazuhiro
Toda, Tomoki
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 949 - 954
[43] The voice of the host country workforce: A key source for improving the effectiveness of expatriate training and performance
Vance, CM
Ensher, EA
INTERNATIONAL JOURNAL OF INTERCULTURAL RELATIONS, 2002, 26 (04) : 447 - 461
[44] Improving the performance of histogram-based data hiding method in the video environment
Ahmad, Tohari
Fatman, Alek Nur
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (04) : 1362 - 1372
[45] RUSBoost: Improving Classification Performance when Training Data is Skewed
Seiffert, Chris
Khoshgoftaar, Taghi M.
Van Hulse, Jason
Napolitano, Amri
19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3650 - 3653
[46] Improving the Performance of HMM-Based Voice Conversion using Context Clustering Decision Tree and Appropriate Regression Matrix Format
Qin, Long
Wu, Yi-jian
Ling, Zhen-Hua
Wang, Ren-Hua
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2250 - 2253
[47] A novel method for voice conversion based on non-parallel corpus
Sayadian A.
Mozaffari F.
International Journal of Speech Technology, 2017, 20 (3) : 587 - 592
[48] Parallel voice conversion with limited training data using stochastic variational deep kernel learning
Jafaryani, Mohamadreza
Sheikhzadeh, Hamid
Pourahmadi, Vahid
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115
[49] Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data
Zhang, Mingyang
Zhou, Yi
Zhao, Li
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1290 - 1302
[50] Is Training to Failure a Safe and Effective Method for Improving Athletic Performance?
Khamoui, Andy V.
Willardson, Jeffrey
STRENGTH AND CONDITIONING JOURNAL, 2011, 33 (04) : 19 - 20

← 1 2 3 4 5 →