The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion

被引:4
|
作者
Chen, Ling-Hui [1 ,2 ]
Liu, Li-Juan [2 ]
Ling, Zhen-Hua [1 ]
Jiang, Yuan [2 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] IFLYTEK Res, Hefei, Anhui, Peoples R China
关键词
voice conversion; frequency warping; DNN; RNN; LSTM;
D O I
10.21437/Interspeech.2016-456
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces the methods we adopt to build our system for the evaluation event of Voice Conversion Challenge (VCC) 2016. We propose to use neural network-based approaches to convert both spectral and excitation features. First, the generatively trained deep neural network (GTDNN) is adopted for spectral envelope conversion after the spectral envelopes have been pre-processed by frequency warping. Second, we propose to use a recurrent neural network (RNN) with long short-term memory (LSTM) cells for F0 trajectory conversion. In addition, we adopt a DNN for band aperiodicity conversion. Both internal tests and formal VCC evaluation results demonstrate the effectiveness of the proposed methods.
引用
收藏
页码:1642 / 1646
页数:5
相关论文
共 50 条
  • [21] Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels
    Le Moine, Clement
    Obin, Nicolas
    Roebel, Axel
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 36 - 40
  • [22] Emotional Voice Conversion with Adaptive Scales F0 based on Wavelet Transform using Limited Amount of Emotional Data
    Luo, Zhaojie
    Chen, Jinhui
    Takiguchi, Tetsuya
    Ariki, Yasuo
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3399 - 3403
  • [23] Direct F0 Estimation with Neural-Network-based Regression
    Xu, Shuzhuang
    Shimodaira, Hiroshi
    INTERSPEECH 2019, 2019, : 1995 - 1999
  • [24] DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion
    Watanabe, Chihiro
    Kameoka, Hirokazu
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1167 - 1171
  • [25] Emotional Voice Conversion Using Dual Supervised Adversarial Networks With Continuous Wavelet Transform F0 Features
    Luo, Zhaojie
    Chen, Jinhui
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (10) : 1535 - 1548
  • [26] Continuous vocoder applied in deep neural network based voice conversion
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (23) : 33549 - 33572
  • [27] A Probabilistic Interpretation for Artificial Neural Network-based Voice Conversion
    Hwang, Hsin-Te
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 552 - 558
  • [28] Continuous vocoder applied in deep neural network based voice conversion
    Mohammed Salah Al-Radhi
    Tamás Gábor Csapó
    Géza Németh
    Multimedia Tools and Applications, 2019, 78 : 33549 - 33572
  • [29] Vocal Tract Spectrum Transformation Based on Clustering in Voice Conversion System
    Xie Weichao
    Zhang Linghua
    PROCEEDING OF THE IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2012, : 236 - 240
  • [30] Neutral-to-emotional voice conversion with cross-wavelet transform F0 using generative adversarial networks
    Luo, Zhaojie
    Chen, Jinhui
    Takiguchi, Tetsuya
    Ariki, Yasuo
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2019, 8 : 1 - 11