Parallel vs. Non-parallel Voice Conversion for Esophageal Speech

被引:4
|
作者
Serrano, Luis [1 ]
Raman, Sneha [1 ]
Tavarez, David [1 ]
Navas, Eva [1 ]
Hernaez, Inma [1 ]
机构
[1] Univ Basque Country UPV EHU, Leioa, Spain
来源
基金
欧盟地平线“2020”;
关键词
voice conversion; speech and voice disorders; alaryngeal voices; speech intelligibility; TRACHEOESOPHAGEAL SPEECH; NEURAL-NETWORKS; ENHANCEMENT; TRANSFORMATION;
D O I
10.21437/Interspeech.2019-2194
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
State of the art systems for voice conversion have been shown to generate highly natural sounding converted speech. Voice conversion techniques have also been applied to alaryngeal speech, with the aim of improving its quality or its intelligibility. In this paper, we present an attempt to apply a voice conversion strategy based on phonetic posteriorgrams (PPGs), which produces very high quality converted speech, to improve the characteristics of esophageal speech. The main advantage of this PPG based architecture lies in the fact that it is able to convert speech from any source, without the need to previously train the system with a parallel corpus. However, our results show that the PPG approach degrades the intelligibility of the converted speech considerably, especially when the input speech is already poorly intelligible. In this paper two systems are compared, an LSTM based one-to-one conversion system, which is referred to as the baseline, and the new system using phonetic posteriorgrams. Both spectral parameters and f(0) are converted using DNN (Deep Neural Network) based architectures. Results from both objective and subjective evaluations are presented, showing that although ASR (Automated Speech Recognition) errors are reduced, original esophageal speech is still preferred by subjects.
引用
收藏
页码:4549 / 4553
页数:5
相关论文
共 50 条
  • [21] Non-parallel training for voice conversion by maximum likelihood constrained adaptation
    Mouchtaris, A
    Van der Spiegel, J
    Mueller, P
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1 - 4
  • [22] MoCoVC: Non-parallel Voice Conversion with Momentum Contrastive Representation Learning
    Onishi, Kotaro
    Nakashika, Toru
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1438 - 1443
  • [23] Non-parallel Voice Conversion using Weighted Generative Adversarial Networks
    Paul, Dipjyoti
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2019, 2019, : 659 - 663
  • [24] A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
    Tian, Xiaohai
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2019, 2019, : 201 - 205
  • [25] Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [26] Non-parallel Sequence-to-Sequence Voice Conversion for Arbitrary Speakers
    Zhang, Ying
    Che, Hao
    Wang, Xiaorui
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [27] NON-PARALLEL MANY-TO-MANY VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM A TEXT-TO-SPEECH MODEL
    Yu, Xinyuan
    Mak, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5924 - 5928
  • [28] 3WRBM-Based Speech Factor Modeling for Arbitrary-Source and Non-Parallel Voice Conversion
    Nakashika, Toru
    Minami, Yasuhiro
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 607 - 611
  • [29] GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus
    Zhang, Zining
    He, Bingsheng
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 791 - 795
  • [30] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
    Tuan Vu Ho
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 106 - 111