Parallel vs. Non-parallel Voice Conversion for Esophageal Speech

被引:4
|
作者
Serrano, Luis [1 ]
Raman, Sneha [1 ]
Tavarez, David [1 ]
Navas, Eva [1 ]
Hernaez, Inma [1 ]
机构
[1] Univ Basque Country UPV EHU, Leioa, Spain
来源
基金
欧盟地平线“2020”;
关键词
voice conversion; speech and voice disorders; alaryngeal voices; speech intelligibility; TRACHEOESOPHAGEAL SPEECH; NEURAL-NETWORKS; ENHANCEMENT; TRANSFORMATION;
D O I
10.21437/Interspeech.2019-2194
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
State of the art systems for voice conversion have been shown to generate highly natural sounding converted speech. Voice conversion techniques have also been applied to alaryngeal speech, with the aim of improving its quality or its intelligibility. In this paper, we present an attempt to apply a voice conversion strategy based on phonetic posteriorgrams (PPGs), which produces very high quality converted speech, to improve the characteristics of esophageal speech. The main advantage of this PPG based architecture lies in the fact that it is able to convert speech from any source, without the need to previously train the system with a parallel corpus. However, our results show that the PPG approach degrades the intelligibility of the converted speech considerably, especially when the input speech is already poorly intelligible. In this paper two systems are compared, an LSTM based one-to-one conversion system, which is referred to as the baseline, and the new system using phonetic posteriorgrams. Both spectral parameters and f(0) are converted using DNN (Deep Neural Network) based architectures. Results from both objective and subjective evaluations are presented, showing that although ASR (Automated Speech Recognition) errors are reduced, original esophageal speech is still preferred by subjects.
引用
收藏
页码:4549 / 4553
页数:5
相关论文
共 50 条
  • [41] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Li, Yanping
    Qiu, Xiangtian
    Cao, Pan
    Zhang, Yan
    Bao, Bingkun
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4632 - 4648
  • [42] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Yanping Li
    Xiangtian Qiu
    Pan Cao
    Yan Zhang
    Bingkun Bao
    Circuits, Systems, and Signal Processing, 2022, 41 : 4632 - 4648
  • [43] A STUDY ON COMBINING NON-PARALLEL AND PARALLEL METHODOLOGIES FOR MANDARIN-ENGLISH CROSS-LINGUAL VOICE CONVERSION
    You, Chang Huai
    Dong, Minghui
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 10491 - 10495
  • [44] Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification
    Saito, Yuki
    Nakamura, Taiki
    Ijima, Yusuke
    Nishida, Kyosuke
    Takamichi, Shinnosuke
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (01) : 1 - 11
  • [45] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Takashima, Yuki
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [46] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Yuki Takashima
    Toru Nakashika
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [47] Non-parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks
    Chen, Minchuan
    Hou, Weijian
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 4716 - 4720
  • [48] Enhanced Variational Auto-encoder for Voice Conversion Using Non-parallel Corpora
    Huang Guojie
    Jin Hui
    Yu Yibiao
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 46 - 49
  • [49] ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder
    Katneoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) : 1432 - 1443
  • [50] Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion
    Shah, Nirmesh J.
    Madhavi, Maulik C.
    Patil, Hemant A.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1968 - 1972