TVQVC: Transformer based Vector Quantized Variational Autoencoder with CTC loss for Voice Conversion

被引:0
|
作者
Chen, Ziyi [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Speech Acoust & Content Understanding, Inst Acoust, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
关键词
voice conversion; vector quantization; transformer; ctc;
D O I
10.21437/Interspeech.2021-1301
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Techniques of voice conversion(VC) aim to modify the speaker identity and style of an utterance while preserving the linguistic content. Although there are lots of VC methods, the state of the art of VC is still cascading automatic speech recognition(ASR) and text-to-speech(TTS). This paper presents a new structure of vector-quantized autoencoder based on transformer with CTC loss for non-parallel VC, which inspired by cascading ASR and TTS VC method. Our proposed method combines CTC loss and vector quantization to get high-level linguistic information without speaker information. Objective and subjective evaluations on the mandarin datasets show that the converted speech of our proposed model is better than baselines on naturalness, rhythm and speaker similarity.
引用
收藏
页码:826 / 830
页数:5
相关论文
共 50 条
  • [31] ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder
    Katneoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) : 1432 - 1443
  • [32] Voice Conversion Based on i-vector With Variational Autoencoding Relativistic Standard Generative Adversarial Network
    Li Y.-P.
    Cao P.
    Zuo Y.-T.
    Zhang Y.
    Qian B.
    Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (07): : 1824 - 1833
  • [33] Generating High-Quality F0 Embeddings Using the Vector-Quantized Variational Autoencoder
    Portes, David
    Horak, Ales
    TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 139 - 148
  • [34] Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice
    Bohy, Hugo
    Tran, Minh
    El Haddad, Kevin
    Dutoit, Thierry
    Soleymani, Mohammad
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [35] T-DVAE: A Transformer-Based Dynamical Variational Autoencoder for Speech
    Perschewski, Jan-Ole
    Stober, Sebastian
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 33 - 46
  • [36] A Voice Conversion Mapping Function based on a Stacked Joint-Autoencoder
    Mohammadi, Seyed Hamidreza
    Kain, Alexander
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1647 - 1651
  • [37] Low-Dose CT Image Reconstruction using Vector Quantized Convolutional Autoencoder with Perceptual Loss
    Ramanathan, Shalini
    Ramasundaram, Mohan
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2023, 48 (02):
  • [38] Low-Dose CT Image Reconstruction using Vector Quantized Convolutional Autoencoder with Perceptual Loss
    Shalini Ramanathan
    Mohan Ramasundaram
    Sādhanā, 48
  • [39] Bone-conducted Speech Enhancement Using Vector-quantized Variational Autoencoder and Gammachirp Filterbank Cepstral Coefficients
    Quoc-Huy Nguyen
    Unoki, Masashi
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 21 - 25
  • [40] WEAKLY SUPERVISED MARINE ANIMAL DETECTION FROM REMOTE SENSING IMAGES USING VECTOR-QUANTIZED VARIATIONAL AUTOENCODER
    Pham, Minh-Tan
    Gangloff, Hugo
    Lefevre, Sebastien
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5559 - 5562