The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion

被引:4
|
作者
Chen, Ling-Hui [1 ,2 ]
Liu, Li-Juan [2 ]
Ling, Zhen-Hua [1 ]
Jiang, Yuan [2 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] IFLYTEK Res, Hefei, Anhui, Peoples R China
关键词
voice conversion; frequency warping; DNN; RNN; LSTM;
D O I
10.21437/Interspeech.2016-456
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces the methods we adopt to build our system for the evaluation event of Voice Conversion Challenge (VCC) 2016. We propose to use neural network-based approaches to convert both spectral and excitation features. First, the generatively trained deep neural network (GTDNN) is adopted for spectral envelope conversion after the spectral envelopes have been pre-processed by frequency warping. Second, we propose to use a recurrent neural network (RNN) with long short-term memory (LSTM) cells for F0 trajectory conversion. In addition, we adopt a DNN for band aperiodicity conversion. Both internal tests and formal VCC evaluation results demonstrate the effectiveness of the proposed methods.
引用
收藏
页码:1642 / 1646
页数:5
相关论文
共 50 条
  • [1] VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0
    Yutani, Kaori
    Uto, Yosuke
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3897 - 3900
  • [2] IMPROVED F0 MODELING AND GENERATION IN VOICE CONVERSION
    Kunikoshi, Aki
    Qian, Yao
    Soong, Frank
    Minematsu, Nobuaki
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4568 - 4571
  • [3] F0 Transformation within the Voice Conversion Framework
    Hanzlicek, Zdenek
    Matousek, Jindrich
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 681 - 684
  • [4] Emotional Voice Conversion Using Deep Neural Networks with MCC and F0 Features
    Luo, Zhaojie
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 977 - 981
  • [5] Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
    Zhaojie Luo
    Jinhui Chen
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2017
  • [6] Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
    Luo, Zhaojie
    Chen, Jinhui
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
  • [7] The NU-NAIST voice conversion system for the Voice Conversion Challenge 2016
    Kobayashi, Kazuhiro
    Takamichi, Shinnosuke
    Nakamura, Satoshi
    Toda, Tomoki
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1667 - 1671
  • [8] HMM-Based Voice Conversion Using Quantized F0 Context
    Nose, Takashi
    Ota, Yuhei
    Kobayashi, Takao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2483 - 2490
  • [9] A STYLE CAPTURING APPROACH TO F0 TRANSFORMATION IN VOICE CONVERSION
    Anumanchipalli, Gopala Krishna
    Oliveira, Luis C.
    Black, Alan W.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6915 - 6919
  • [10] IMPLEMENTATION OF F0 TRANSFORMATION FOR STATISTICAL SINGING VOICE CONVERSION BASED ON DIRECTWAVEFORM MODIFICATION
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Nakamura, Satoshi
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5670 - 5674