ATTENTION-BASED WAVENET AUTOENCODER FOR UNIVERSAL VOICE CONVERSION

被引:0
|
作者
Polyak, Adam [1 ]
Wolf, Lior
机构
[1] Facebook AI Res, Cambridge, MA 02142 USA
关键词
D O I
10.1109/icassp.2019.8682589
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a method for converting any voice to a target voice. The method is based on a WaveNet autoencoder, with the addition of a novel attention component that supports the modification of timing between the input and the output samples. Training the attention is done in an unsupervised way, by teaching the neural network to recover the original timing from an artificially modified one. Adding a generic voice robot, which we convert to the target voice, we present a robust Text To Speech pipeline that is able to train without any transcript. Our experiments show that the proposed method is able to recover the timing of the speaker and that the proposed pipeline provides a competitive Text To Speech method.
引用
收藏
页码:6800 / 6804
页数:5
相关论文
共 50 条
  • [31] A Voice Conversion Mapping Function based on a Stacked Joint-Autoencoder
    Mohammadi, Seyed Hamidreza
    Kain, Alexander
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1647 - 1651
  • [32] ATTENTION-BASED END-TO-END SPEECH RECOGNITION ON VOICE SEARCH
    Shan, Changhao
    Zhang, Junbo
    Wang, Yujun
    Xie, Lei
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4764 - 4768
  • [33] A COMPACT FRAMEWORK FOR VOICE CONVERSION USING WAVENET CONDITIONED ON PHONETIC POSTERIORGRAMS
    Lu, Hui
    Wu, Zhiyong
    Li, Runnan
    Kang, Shiyin
    Jia, Jia
    Meng, Helen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6810 - 6814
  • [34] A Dual Attention-Based Autoencoder Model for Fetal ECG Extraction From Abdominal Signals
    Ghonchi, Hamidreza
    Abolghasemi, Vahid
    IEEE SENSORS JOURNAL, 2022, 22 (23) : 22908 - 22918
  • [35] Simultaneous Pipe Leak Detection and Localization Using Attention-Based Deep Learning Autoencoder
    Karimanzira, Divas
    ELECTRONICS, 2023, 12 (22)
  • [36] A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing
    Miao, Xiaokong
    Zhang, Xiongwei
    Sun, Meng
    Zheng, Changyan
    Cao, Tieyong
    IEEE ACCESS, 2019, 7 (54321-54329) : 54321 - 54329
  • [37] A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
    Tian, Xiaohai
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2019, 2019, : 201 - 205
  • [38] An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9 (01)
  • [39] Attention-based vector quantisation variational autoencoder for colour-patterned fabrics defect detection
    Zhang, Hongwei
    Qiao, Guanhua
    Liu, Shuting
    Lyu, Yuting
    Yao, Le
    Ge, Zhiqiang
    COLORATION TECHNOLOGY, 2023, 139 (03) : 223 - 238
  • [40] A Multivariate Anomaly Detector for Satellite Telemetry Data Using Temporal Attention-Based LSTM Autoencoder
    Xu, Zhaoping
    Cheng, Zhijun
    Guo, Bo
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72