ATTENTION-BASED WAVENET AUTOENCODER FOR UNIVERSAL VOICE CONVERSION

被引：0

作者：

Polyak, Adam ^{[1
]}

Wolf, Lior

机构：

[1] Facebook AI Res, Cambridge, MA 02142 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

D O I：

10.1109/icassp.2019.8682589

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a method for converting any voice to a target voice. The method is based on a WaveNet autoencoder, with the addition of a novel attention component that supports the modification of timing between the input and the output samples. Training the attention is done in an unsupervised way, by teaching the neural network to recover the original timing from an artificially modified one. Adding a generic voice robot, which we convert to the target voice, we present a robust Text To Speech pipeline that is able to train without any transcript. Our experiments show that the proposed method is able to recover the timing of the speaker and that the proposed pipeline provides a competitive Text To Speech method.

引用

页码：6800 / 6804

页数：5

共 50 条

[1] Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion
Huang, Wen-Chin
Wu, Yi-Chiao
Hwang, Hsin-Te
Tobing, Patrick Lumban
Hayashi, Tomoki
Kobayashi, Kazuhiro
Toda, Tomoki
Tsao, Yu
Wang, Hsin-Min
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[2] STATISTICAL VOICE CONVERSION BASED ON WAVENET
Niwa, Jumpei
Yoshimura, Takenori
Hashimoto, Kei
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5289 - 5293
[3] Attention-Based Speaker Embeddings for One-Shot Voice Conversion
Ishihara, Tatsuma
Saito, Daisuke
INTERSPEECH 2020, 2020, : 806 - 810
[4] Adversarial Attention-Based Variational Graph Autoencoder
Weng, Ziqiang
Zhang, Weiyu
Dou, Wei
IEEE ACCESS, 2020, 8 : 152637 - 152645
[5] Statistical voice conversion with WaveNet-based waveform generation
Kobayashi, Kazuhiro
Hayashi, Tomoki
Tamamori, Akira
Toda, Tomoki
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1138 - 1142
[6] Attention-Based Recurrent Autoencoder for Motion Capture Denoising
Zhu, Yongqiong
Zhang, Fan
Xiao, Zhidong
JOURNAL OF INTERNET TECHNOLOGY, 2022, 23 (06): : 1325 - 1333
[7] Attention-based residual autoencoder for video anomaly detection
Viet-Tuan Le
Yong-Guk Kim
Applied Intelligence, 2023, 53 : 3240 - 3254
[8] Attention-based residual autoencoder for video anomaly detection
Le, Viet-Tuan
Kim, Yong-Guk
APPLIED INTELLIGENCE, 2023, 53 (03) : 3240 - 3254
[9] Attention-based Autoencoder Topic Model for Short Texts
Tian, Tian
Fang, Zheng
10TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2019) / THE 2ND INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40 2019) / AFFILIATED WORKSHOPS, 2019, 151 : 1134 - 1139
[10] Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion
Chen, Yun
Yang, Lingxiao
Chen, Qi
Lai, Jian-Huang
Xie, Xiaohua
INTERSPEECH 2023, 2023, : 2068 - 2072

← 1 2 3 4 5 →