The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion

被引:4
|
作者
Chen, Ling-Hui [1 ,2 ]
Liu, Li-Juan [2 ]
Ling, Zhen-Hua [1 ]
Jiang, Yuan [2 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] IFLYTEK Res, Hefei, Anhui, Peoples R China
关键词
voice conversion; frequency warping; DNN; RNN; LSTM;
D O I
10.21437/Interspeech.2016-456
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces the methods we adopt to build our system for the evaluation event of Voice Conversion Challenge (VCC) 2016. We propose to use neural network-based approaches to convert both spectral and excitation features. First, the generatively trained deep neural network (GTDNN) is adopted for spectral envelope conversion after the spectral envelopes have been pre-processed by frequency warping. Second, we propose to use a recurrent neural network (RNN) with long short-term memory (LSTM) cells for F0 trajectory conversion. In addition, we adopt a DNN for band aperiodicity conversion. Both internal tests and formal VCC evaluation results demonstrate the effectiveness of the proposed methods.
引用
收藏
页码:1642 / 1646
页数:5
相关论文
共 50 条
  • [31] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
    CHEN Xian-tong
    ZHANG Ling-hua
    The Journal of China Universities of Posts and Telecommunications, 2014, (05) : 68 - 75
  • [32] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
    CHEN Xian-tong
    ZHANG Ling-hua
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2014, 21 (05) : 68 - 75+93
  • [33] Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network
    Gan, Zhenye
    Xing, Xiaotian
    Yang, Hongwu
    Zhao, Guangying
    PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 67 - 71
  • [34] Denoising Recurrent Neural Network for Deep Bidirectional LSTM based Voice Conversion
    Wu, Jie
    Huang, Dongyan
    Xie, Lei
    Li, Haizhou
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3379 - 3383
  • [35] Deep Neural Network based Voice Conversion with A Large Synthesized Parallel Corpus
    Wen, Zhengqi
    Li, Kehuang
    Tao, Jianhua
    Lee, Chin-Hui
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [36] NEURAL-NETWORK-BASED F0 TEXT-TO-SPEECH SYNTHESIZER FOR MANDARINE
    HWANG, SH
    CHEN, SH
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1994, 141 (06): : 384 - 390
  • [37] Voice conversion based on quantum particle swarm optimization of generalized regression neural network
    Wang Min
    Zhao Yuan
    Liu Li
    Xu Juan
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2018, 33 (02) : 165 - 173
  • [38] Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features
    Zheng, Huadi
    Cai, Weicheng
    Zhou, Tianyan
    Zhang, Shilei
    Li, Ming
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2872 - 2877
  • [39] F0 Modeling in HMM-Based Speech Synthesis System using Deep Belief Network
    Mukherjee, Sankar
    Mandal, Shyamal Kumar Das
    2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
  • [40] DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System
    Zhang, Liqiang
    Yu, Chengzhu
    Lu, Heng
    Weng, Chao
    Zhang, Chunlei
    Wu, Yusong
    Xie, Xiang
    Li, Zijin
    Yu, Dong
    INTERSPEECH 2020, 2020, : 1231 - 1235