ATTENTION-BASED WAVENET AUTOENCODER FOR UNIVERSAL VOICE CONVERSION

被引:0
|
作者
Polyak, Adam [1 ]
Wolf, Lior
机构
[1] Facebook AI Res, Cambridge, MA 02142 USA
关键词
D O I
10.1109/icassp.2019.8682589
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a method for converting any voice to a target voice. The method is based on a WaveNet autoencoder, with the addition of a novel attention component that supports the modification of timing between the input and the output samples. Training the attention is done in an unsupervised way, by teaching the neural network to recover the original timing from an artificially modified one. Adding a generic voice robot, which we convert to the target voice, we present a robust Text To Speech pipeline that is able to train without any transcript. Our experiments show that the proposed method is able to recover the timing of the speaker and that the proposed pipeline provides a competitive Text To Speech method.
引用
收藏
页码:6800 / 6804
页数:5
相关论文
共 50 条
  • [1] Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Hwang, Hsin-Te
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Tsao, Yu
    Wang, Hsin-Min
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [2] STATISTICAL VOICE CONVERSION BASED ON WAVENET
    Niwa, Jumpei
    Yoshimura, Takenori
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5289 - 5293
  • [3] Attention-Based Speaker Embeddings for One-Shot Voice Conversion
    Ishihara, Tatsuma
    Saito, Daisuke
    INTERSPEECH 2020, 2020, : 806 - 810
  • [4] Adversarial Attention-Based Variational Graph Autoencoder
    Weng, Ziqiang
    Zhang, Weiyu
    Dou, Wei
    IEEE ACCESS, 2020, 8 : 152637 - 152645
  • [5] Statistical voice conversion with WaveNet-based waveform generation
    Kobayashi, Kazuhiro
    Hayashi, Tomoki
    Tamamori, Akira
    Toda, Tomoki
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1138 - 1142
  • [6] Attention-Based Recurrent Autoencoder for Motion Capture Denoising
    Zhu, Yongqiong
    Zhang, Fan
    Xiao, Zhidong
    JOURNAL OF INTERNET TECHNOLOGY, 2022, 23 (06): : 1325 - 1333
  • [7] Attention-based residual autoencoder for video anomaly detection
    Viet-Tuan Le
    Yong-Guk Kim
    Applied Intelligence, 2023, 53 : 3240 - 3254
  • [8] Attention-based residual autoencoder for video anomaly detection
    Le, Viet-Tuan
    Kim, Yong-Guk
    APPLIED INTELLIGENCE, 2023, 53 (03) : 3240 - 3254
  • [9] Attention-based Autoencoder Topic Model for Short Texts
    Tian, Tian
    Fang, Zheng
    10TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2019) / THE 2ND INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40 2019) / AFFILIATED WORKSHOPS, 2019, 151 : 1134 - 1139
  • [10] Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion
    Chen, Yun
    Yang, Lingxiao
    Chen, Qi
    Lai, Jian-Huang
    Xie, Xiaohua
    INTERSPEECH 2023, 2023, : 2068 - 2072