Improving the performance of MGM-based voice conversion by preparing training data method

被引:0
|
作者
Zuo, GY [1 ]
Liu, WJ [1 ]
Ruan, XG [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral features representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/UV) decision scheme for time-alignment is provided to obtain the right data for training mixture Gaussian spectral mapping function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods and V/UV decisions strategies to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.
引用
收藏
页码:181 / 184
页数:4
相关论文
共 50 条
  • [41] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON FT-GMM
    Chen, Ling-Hui
    Ling, Zhen-Hua
    Dai, Li-Rong
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5116 - 5119
  • [42] TWO-STAGE TRAINING METHOD FOR JAPANESE ELECTROLARYNGEAL SPEECH ENHANCEMENT BASED ON SEQUENCE-TO-SEQUENCE VOICE CONVERSION
    Ma, Ding
    Violeta, Lester Phillip
    Kobayashi, Kazuhiro
    Toda, Tomoki
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 949 - 954
  • [43] The voice of the host country workforce: A key source for improving the effectiveness of expatriate training and performance
    Vance, CM
    Ensher, EA
    INTERNATIONAL JOURNAL OF INTERCULTURAL RELATIONS, 2002, 26 (04) : 447 - 461
  • [44] Improving the performance of histogram-based data hiding method in the video environment
    Ahmad, Tohari
    Fatman, Alek Nur
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (04) : 1362 - 1372
  • [45] RUSBoost: Improving Classification Performance when Training Data is Skewed
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3650 - 3653
  • [46] Improving the Performance of HMM-Based Voice Conversion using Context Clustering Decision Tree and Appropriate Regression Matrix Format
    Qin, Long
    Wu, Yi-jian
    Ling, Zhen-Hua
    Wang, Ren-Hua
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2250 - 2253
  • [47] A novel method for voice conversion based on non-parallel corpus
    Sayadian A.
    Mozaffari F.
    International Journal of Speech Technology, 2017, 20 (3) : 587 - 592
  • [48] Parallel voice conversion with limited training data using stochastic variational deep kernel learning
    Jafaryani, Mohamadreza
    Sheikhzadeh, Hamid
    Pourahmadi, Vahid
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115
  • [49] Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data
    Zhang, Mingyang
    Zhou, Yi
    Zhao, Li
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1290 - 1302
  • [50] Is Training to Failure a Safe and Effective Method for Improving Athletic Performance?
    Khamoui, Andy V.
    Willardson, Jeffrey
    STRENGTH AND CONDITIONING JOURNAL, 2011, 33 (04) : 19 - 20