Assessment of Self-Supervised Denoising Methods for Esophageal Speech Enhancement

被引:0
|
作者
Amarjouf, Madiha [1 ]
Ibn Elhaj, El Hassan [1 ]
Chami, Mouhcine [2 ]
Ezzine, Kadria [3 ]
Di Martino, Joseph [3 ]
机构
[1] Natl Inst Posts & Telecommun INPT, Res Lab Telecommun Syst Networks & Serv STRS, Res Team Multimedia Signal & Commun Syst MUSICS, Ave Allal Fassi, Rabat 10112, Morocco
[2] Natl Inst Posts & Telecommun INPT, Res Lab Telecommun Syst Networks & Serv STRS, Res Team Secure & Mixed Architecture Reliable Tech, Ave Allal Fassi, Rabat 10112, Morocco
[3] LORIA Lab Lorrain Rech Informat & Ses Applicat, BP 239, F-54506 Vandoeuvre Les Nancy, France
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 15期
关键词
esophageal speech; self-supervised denoising; speech enhancement; DCUNET; DCUNET-cTSTM; STFT; VoiceFixer; VOICE CONVERSION;
D O I
10.3390/app14156682
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Esophageal speech (ES) is a pathological voice that is often difficult to understand. Moreover, acquiring recordings of a patient's voice before a laryngectomy proves challenging, thereby complicating enhancing this kind of voice. That is why most supervised methods used to enhance ES are based on voice conversion, which uses healthy speaker targets, things that may not preserve the speaker's identity. Otherwise, unsupervised methods for ES are mostly based on traditional filters, which cannot alone beat this kind of noise, making the denoising process difficult. Also, these methods are known for producing musical artifacts. To address these issues, a self-supervised method based on the Only-Noisy-Training (ONT) model was applied, consisting of denoising a signal without needing a clean target. Four experiments were conducted using Deep Complex UNET (DCUNET) and Deep Complex UNET with Complex Two-Stage Transformer Module (DCUNET-cTSTM) for assessment. Both of these models are based on the ONT approach. Also, for comparison purposes and to calculate the evaluation metrics, the pre-trained VoiceFixer model was used to restore the clean wave files of esophageal speech. Even with the fact that ONT-based methods work better with noisy wave files, the results have proven that ES can be denoised without the need for clean targets, and hence, the speaker's identity is retained.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Self-supervised learning enhancement and detection methods for nocturnal animal images
    Wang, Chi
    Shen, Chen
    Huang, Qing
    Zhang, Guo-feng
    Lu, Han
    Chen, Jin-bo
    CHINESE OPTICS, 2024, 17 (05) : 1087 - 1097
  • [32] Self-Supervised Seismic Resolution Enhancement
    Cheng, Shijun
    Zhang, Haoran
    Alkhalifah, Tariq
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [33] PERCEPTUAL LOSS BASED SPEECH DENOISING WITH AN ENSEMBLE OF AUDIO PATTERN RECOGNITION AND SELF-SUPERVISED MODELS
    Kataria, Saurabh
    Villalba, Jesus
    Dehak, Najim
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7118 - 7122
  • [34] Image denoising for fluorescence microscopy by supervised to self-supervised transfer learning
    Wang, Yina
    Pinkard, Henry
    Khwaja, Emaad
    Zhou, Shuqin
    Waller, Laura
    Huang, Bo
    OPTICS EXPRESS, 2021, 29 (25) : 41303 - 41312
  • [35] Self-Supervised Denoising for Real Satellite Hyperspectral Imagery
    Qin, Jinchun
    Zhao, Hongrui
    Liu, Bing
    REMOTE SENSING, 2022, 14 (13)
  • [36] Self-Supervised Learning for Generic Raman Spectrum Denoising
    Wu, Siyi
    Zhang, Yumin
    He, Chang
    Luo, Zhewen
    Chen, Zhou
    Ye, Jian
    ANALYTICAL CHEMISTRY, 2024, 96 (44) : 17476 - 17485
  • [37] Self-Supervised Learning for Action Recognition by Video Denoising
    Thi Thu Trang Phung
    Thi Hong Thu Ma
    Van Truong Nguyen
    Duc Quang Vu
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 76 - 81
  • [38] A self-supervised network for image denoising and watermark removal
    Tian, Chunwei
    Xiao, Jingyu
    Zhang, Bob
    Zuo, Wangmeng
    Zhang, Yudong
    Lin, Chia -Wen
    NEURAL NETWORKS, 2024, 174
  • [39] Self-Supervised Learning for Seismic Data Reconstruction and Denoising
    Meng, Fanlei
    Fan, QinYin
    Li, Yue
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [40] Self-Supervised Pretraining Transformer for Seismic Data Denoising
    Wang, Hongzhou
    Lin, Jun
    Li, Yue
    Dong, Xintong
    Tong, Xunqian
    Lu, Shaoping
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 25