The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion

被引：4

作者：

Chen, Ling-Hui ^{[1
,2
]}

Liu, Li-Juan ^{[2
]}

Ling, Zhen-Hua ^{[1
]}

Jiang, Yuan ^{[2
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China

[2] IFLYTEK Res, Hefei, Anhui, Peoples R China

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

voice conversion; frequency warping; DNN; RNN; LSTM;

D O I：

10.21437/Interspeech.2016-456

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper introduces the methods we adopt to build our system for the evaluation event of Voice Conversion Challenge (VCC) 2016. We propose to use neural network-based approaches to convert both spectral and excitation features. First, the generatively trained deep neural network (GTDNN) is adopted for spectral envelope conversion after the spectral envelopes have been pre-processed by frequency warping. Second, we propose to use a recurrent neural network (RNN) with long short-term memory (LSTM) cells for F0 trajectory conversion. In addition, we adopt a DNN for band aperiodicity conversion. Both internal tests and formal VCC evaluation results demonstrate the effectiveness of the proposed methods.

引用

页码：1642 / 1646

页数：5

共 50 条

[31] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
CHEN Xian-tong
ZHANG Ling-hua
The Journal of China Universities of Posts and Telecommunications, 2014, (05) : 68 - 75
[32] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
CHEN Xian-tong
ZHANG Ling-hua
TheJournalofChinaUniversitiesofPostsandTelecommunications, 2014, 21 (05) : 68 - 75+93
[33] Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network
Gan, Zhenye
Xing, Xiaotian
Yang, Hongwu
Zhao, Guangying
PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 67 - 71
[34] Denoising Recurrent Neural Network for Deep Bidirectional LSTM based Voice Conversion
Wu, Jie
Huang, Dongyan
Xie, Lei
Li, Haizhou
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3379 - 3383
[35] Deep Neural Network based Voice Conversion with A Large Synthesized Parallel Corpus
Wen, Zhengqi
Li, Kehuang
Tao, Jianhua
Lee, Chin-Hui
2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[36] NEURAL-NETWORK-BASED F0 TEXT-TO-SPEECH SYNTHESIZER FOR MANDARINE
HWANG, SH
CHEN, SH
IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1994, 141 (06): : 384 - 390
[37] Voice conversion based on quantum particle swarm optimization of generalized regression neural network
Wang Min
Zhao Yuan
Liu Li
Xu Juan
CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2018, 33 (02) : 165 - 173
[38] Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features
Zheng, Huadi
Cai, Weicheng
Zhou, Tianyan
Zhang, Shilei
Li, Ming
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2872 - 2877
[39] F0 Modeling in HMM-Based Speech Synthesis System using Deep Belief Network
Mukherjee, Sankar
Mandal, Shyamal Kumar Das
2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
[40] DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System
Zhang, Liqiang
Yu, Chengzhu
Lu, Heng
Weng, Chao
Zhang, Chunlei
Wu, Yusong
Xie, Xiang
Li, Zijin
Yu, Dong
INTERSPEECH 2020, 2020, : 1231 - 1235

← 1 2 3 4 5 →