ATTENTION-BASED WAVENET AUTOENCODER FOR UNIVERSAL VOICE CONVERSION

被引:0
|
作者
Polyak, Adam [1 ]
Wolf, Lior
机构
[1] Facebook AI Res, Cambridge, MA 02142 USA
关键词
D O I
10.1109/icassp.2019.8682589
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a method for converting any voice to a target voice. The method is based on a WaveNet autoencoder, with the addition of a novel attention component that supports the modification of timing between the input and the output samples. Training the attention is done in an unsupervised way, by teaching the neural network to recover the original timing from an artificially modified one. Adding a generic voice robot, which we convert to the target voice, we present a robust Text To Speech pipeline that is able to train without any transcript. Our experiments show that the proposed method is able to recover the timing of the speaker and that the proposed pipeline provides a competitive Text To Speech method.
引用
收藏
页码:6800 / 6804
页数:5
相关论文
共 50 条
  • [21] The Attention-Based Autoencoder for Network Traffic Classification with Interpretable Feature Representation
    Cui, Jun
    Bai, Longkun
    Zhang, Xiaofeng
    Lin, Zhigui
    Liu, Qi
    SYMMETRY-BASEL, 2024, 16 (05):
  • [22] Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    IEEE ACCESS, 2019, 7 : 171114 - 171125
  • [23] High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
    Chen, Kuan
    Chen, Bo
    Lai, Jiahao
    Yu, Kai
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1993 - 1997
  • [24] AN EVALUATION OF DEEP SPECTRAL MAPPINGS AND WAVENET VOCODER FOR VOICE CONVERSION
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kobayashi, Kazuhiro
    Toda, Tomoki
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 297 - 303
  • [25] Spectro-Temporal Attention-Based Voice Activity Detection
    Lee, Younglo
    Min, Jeongki
    Han, David K.
    Ko, Hanseok
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 131 - 135
  • [26] A content-based recommender system using stacked LSTM and an attention-based autoencoder
    Saini K.
    Singh A.
    Measurement: Sensors, 2024, 31
  • [27] AAANE: Attention-Based Adversarial Autoencoder for Multi-scale Network Embedding
    Sang, Lei
    Xu, Min
    Qian, Shengsheng
    Wu, Xindong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT III, 2019, 11441 : 3 - 14
  • [28] Cognitive Workload Estimation Using Variational Autoencoder and Attention-Based Deep Model
    Chakladar, Debashis Das
    Datta, Sumalyo
    Roy, Partha Pratim
    Prasad, Vinod A.
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (02) : 581 - 590
  • [29] DeMAAE: deep multiplicative attention-based autoencoder for identification of peculiarities in video sequences
    Nazia Aslam
    Maheshkumar H. Kolekar
    The Visual Computer, 2024, 40 : 1729 - 1743
  • [30] DeMAAE: deep multiplicative attention-based autoencoder for identification of peculiarities in video sequences
    Aslam, Nazia
    Kolekar, Maheshkumar H.
    VISUAL COMPUTER, 2024, 40 (03): : 1729 - 1743