Improving the performance of MGM-based voice conversion by preparing training data method

被引：0

作者：

Zuo, GY ^{[1
]}

Liu, WJ ^{[1
]}

Ruan, XG ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China

来源：

2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS | 2004年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral features representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/UV) decision scheme for time-alignment is provided to obtain the right data for training mixture Gaussian spectral mapping function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods and V/UV decisions strategies to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.

引用

页码：181 / 184

页数：4

共 50 条

[31] Improving the performance of ANN training with an unsupervised filtering method
Remy, Sekou
Park, Chung Hyuk
Howard, Ayanna M.
IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 747 - 753
[32] IMPROVING ADVERSARIAL WAVEFORM GENERATION BASED SINGING VOICE CONVERSION WITH HARMONIC SIGNALS
Guo, Haohan
Zhou, Zhiping
Meng, Fanbo
Liu, Kai
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6657 - 6661
[33] Improving the computational performance of standard GMM-based voice conversion systems used in real-time applications
Ben Othmane, Imen
Di Martino, Joseph
Ouni, Kais
2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, CONTROL, OPTIMIZATION AND COMPUTER SCIENCE (ICECOCS), 2018,
[34] VAW-GAN for Singing Voice Conversion with Non-parallel Training Data
Lu, Junchen
Zhou, Kun
Sisman, Berrak
Li, Haizhou
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 514 - 519
[35] Training data selection for voice conversion using speaker selection and vector field smoothing
Hashimoto, M
Higuchi, N
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1397 - 1400
[36] ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data
Lian, Zheng
Wen, Zhengqi
Zhou, Xinyong
Pu, Songbai
Zhang, Shengkai
Tao, Jianhua
INTERSPEECH 2020, 2020, : 4706 - 4710
[37] PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAINING
Sun, Lifa
Li, Kun
Wang, Hao
Kang, Shiyin
Meng, Helen
2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
[38] Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion
Doi, Hironori
Toda, Tomoki
Nakano, Tomoyasu
Goto, Masataka
Nakamura, Satoshi
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1066 - 1070
[39] Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data
Li, Yanping
Lee, Kong Aik
Yuan, Yougen
Li, Haizhou
Yang, Zhen
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 829 - 833
[40] Incorporating Global Variance in the Training Phase of GMM-based Voice Conversion
Hwang, Hsin-Te
Tsao, Yu
Wang, Hsin-Min
Wang, Yih-Ru
Chen, Sin-Horng
2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,

← 1 2 3 4 5 →