ATTENTION-BASED WAVENET AUTOENCODER FOR UNIVERSAL VOICE CONVERSION

被引：0

作者：

Polyak, Adam ^{[1
]}

Wolf, Lior

机构：

[1] Facebook AI Res, Cambridge, MA 02142 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

D O I：

10.1109/icassp.2019.8682589

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a method for converting any voice to a target voice. The method is based on a WaveNet autoencoder, with the addition of a novel attention component that supports the modification of timing between the input and the output samples. Training the attention is done in an unsupervised way, by teaching the neural network to recover the original timing from an artificially modified one. Adding a generic voice robot, which we convert to the target voice, we present a robust Text To Speech pipeline that is able to train without any transcript. Our experiments show that the proposed method is able to recover the timing of the speaker and that the proposed pipeline provides a competitive Text To Speech method.

引用

页码：6800 / 6804

页数：5

共 50 条

[21] The Attention-Based Autoencoder for Network Traffic Classification with Interpretable Feature Representation
Cui, Jun
Bai, Longkun
Zhang, Xiaofeng
Lin, Zhigui
Liu, Qi
SYMMETRY-BASEL, 2024, 16 (05):
[22] Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder
Tobing, Patrick Lumban
Wu, Yi-Chiao
Hayashi, Tomoki
Kobayashi, Kazuhiro
Toda, Tomoki
IEEE ACCESS, 2019, 7 : 171114 - 171125
[23] High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
Chen, Kuan
Chen, Bo
Lai, Jiahao
Yu, Kai
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1993 - 1997
[24] AN EVALUATION OF DEEP SPECTRAL MAPPINGS AND WAVENET VOCODER FOR VOICE CONVERSION
Tobing, Patrick Lumban
Hayashi, Tomoki
Wu, Yi-Chiao
Kobayashi, Kazuhiro
Toda, Tomoki
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 297 - 303
[25] Spectro-Temporal Attention-Based Voice Activity Detection
Lee, Younglo
Min, Jeongki
Han, David K.
Ko, Hanseok
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 131 - 135
[26] A content-based recommender system using stacked LSTM and an attention-based autoencoder
Saini K.
Singh A.
Measurement: Sensors, 2024, 31
[27] AAANE: Attention-Based Adversarial Autoencoder for Multi-scale Network Embedding
Sang, Lei
Xu, Min
Qian, Shengsheng
Wu, Xindong
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT III, 2019, 11441 : 3 - 14
[28] Cognitive Workload Estimation Using Variational Autoencoder and Attention-Based Deep Model
Chakladar, Debashis Das
Datta, Sumalyo
Roy, Partha Pratim
Prasad, Vinod A.
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (02) : 581 - 590
[29] DeMAAE: deep multiplicative attention-based autoencoder for identification of peculiarities in video sequences
Nazia Aslam
Maheshkumar H. Kolekar
The Visual Computer, 2024, 40 : 1729 - 1743
[30] DeMAAE: deep multiplicative attention-based autoencoder for identification of peculiarities in video sequences
Aslam, Nazia
Kolekar, Maheshkumar H.
VISUAL COMPUTER, 2024, 40 (03): : 1729 - 1743

← 1 2 3 4 5 →