ATTENTION-BASED WAVENET AUTOENCODER FOR UNIVERSAL VOICE CONVERSION

被引：0

作者：

Polyak, Adam ^{[1
]}

Wolf, Lior

机构：

[1] Facebook AI Res, Cambridge, MA 02142 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

D O I：

10.1109/icassp.2019.8682589

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a method for converting any voice to a target voice. The method is based on a WaveNet autoencoder, with the addition of a novel attention component that supports the modification of timing between the input and the output samples. Training the attention is done in an unsupervised way, by teaching the neural network to recover the original timing from an artificially modified one. Adding a generic voice robot, which we convert to the target voice, we present a robust Text To Speech pipeline that is able to train without any transcript. Our experiments show that the proposed method is able to recover the timing of the speaker and that the proposed pipeline provides a competitive Text To Speech method.

引用

页码：6800 / 6804

页数：5

共 50 条

[31] A Voice Conversion Mapping Function based on a Stacked Joint-Autoencoder
Mohammadi, Seyed Hamidreza
Kain, Alexander
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1647 - 1651
[32] ATTENTION-BASED END-TO-END SPEECH RECOGNITION ON VOICE SEARCH
Shan, Changhao
Zhang, Junbo
Wang, Yujun
Xie, Lei
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4764 - 4768
[33] A COMPACT FRAMEWORK FOR VOICE CONVERSION USING WAVENET CONDITIONED ON PHONETIC POSTERIORGRAMS
Lu, Hui
Wu, Zhiyong
Li, Runnan
Kang, Shiyin
Jia, Jia
Meng, Helen
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6810 - 6814
[34] A Dual Attention-Based Autoencoder Model for Fetal ECG Extraction From Abdominal Signals
Ghonchi, Hamidreza
Abolghasemi, Vahid
IEEE SENSORS JOURNAL, 2022, 22 (23) : 22908 - 22918
[35] Simultaneous Pipe Leak Detection and Localization Using Attention-Based Deep Learning Autoencoder
Karimanzira, Divas
ELECTRONICS, 2023, 12 (22)
[36] A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing
Miao, Xiaokong
Zhang, Xiongwei
Sun, Meng
Zheng, Changyan
Cao, Tieyong
IEEE ACCESS, 2019, 7 (54321-54329) : 54321 - 54329
[37] A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
Tian, Xiaohai
Chng, Eng Siong
Li, Haizhou
INTERSPEECH 2019, 2019, : 201 - 205
[38] An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder
Tobing, Patrick Lumban
Wu, Yi-Chiao
Hayashi, Tomoki
Kobayashi, Kazuhiro
Toda, Tomoki
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9 (01)
[39] Attention-based vector quantisation variational autoencoder for colour-patterned fabrics defect detection
Zhang, Hongwei
Qiao, Guanhua
Liu, Shuting
Lyu, Yuting
Yao, Le
Ge, Zhiqiang
COLORATION TECHNOLOGY, 2023, 139 (03) : 223 - 238
[40] A Multivariate Anomaly Detector for Satellite Telemetry Data Using Temporal Attention-Based LSTM Autoencoder
Xu, Zhaoping
Cheng, Zhijun
Guo, Bo
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72

← 1 2 3 4 5 →