Assessment of Self-Supervised Denoising Methods for Esophageal Speech Enhancement

被引:0
|
作者
Amarjouf, Madiha [1 ]
Ibn Elhaj, El Hassan [1 ]
Chami, Mouhcine [2 ]
Ezzine, Kadria [3 ]
Di Martino, Joseph [3 ]
机构
[1] Natl Inst Posts & Telecommun INPT, Res Lab Telecommun Syst Networks & Serv STRS, Res Team Multimedia Signal & Commun Syst MUSICS, Ave Allal Fassi, Rabat 10112, Morocco
[2] Natl Inst Posts & Telecommun INPT, Res Lab Telecommun Syst Networks & Serv STRS, Res Team Secure & Mixed Architecture Reliable Tech, Ave Allal Fassi, Rabat 10112, Morocco
[3] LORIA Lab Lorrain Rech Informat & Ses Applicat, BP 239, F-54506 Vandoeuvre Les Nancy, France
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 15期
关键词
esophageal speech; self-supervised denoising; speech enhancement; DCUNET; DCUNET-cTSTM; STFT; VoiceFixer; VOICE CONVERSION;
D O I
10.3390/app14156682
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Esophageal speech (ES) is a pathological voice that is often difficult to understand. Moreover, acquiring recordings of a patient's voice before a laryngectomy proves challenging, thereby complicating enhancing this kind of voice. That is why most supervised methods used to enhance ES are based on voice conversion, which uses healthy speaker targets, things that may not preserve the speaker's identity. Otherwise, unsupervised methods for ES are mostly based on traditional filters, which cannot alone beat this kind of noise, making the denoising process difficult. Also, these methods are known for producing musical artifacts. To address these issues, a self-supervised method based on the Only-Noisy-Training (ONT) model was applied, consisting of denoising a signal without needing a clean target. Four experiments were conducted using Deep Complex UNET (DCUNET) and Deep Complex UNET with Complex Two-Stage Transformer Module (DCUNET-cTSTM) for assessment. Both of these models are based on the ONT approach. Also, for comparison purposes and to calculate the evaluation metrics, the pre-trained VoiceFixer model was used to restore the clean wave files of esophageal speech. Even with the fact that ONT-based methods work better with noisy wave files, the results have proven that ES can be denoised without the need for clean targets, and hence, the speaker's identity is retained.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] SELF-SUPERVISED DENOISING AUTOENCODER WITH LINEAR REGRESSION DECODER FOR SPEECH ENHANCEMENT
    Zezario, Ryandhimas E.
    Hussain, Tassadaq
    Lu, Xugang
    Wang, Hsin-Min
    Tsao, Yu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6669 - 6673
  • [2] Boosting Self-Supervised Embeddings for Speech Enhancement
    Hung, Kuo-Hsuan
    Fu, Szu-Wei
    Tseng, Huan-Hsin
    Chiang, Hsin-Tien
    Tsao, Yu
    Lin, Chii-Wann
    INTERSPEECH 2022, 2022, : 186 - 190
  • [3] INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION
    Huang, Zili
    Watanabe, Shinji
    Yang, Shu-wen
    Garcia, Paola
    Khudanpur, Sanjeev
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6837 - 6841
  • [4] Self-supervised PET Denoising
    Yie, Si Young
    Kang, Seung Kwan
    Hwang, Donghwi
    Lee, Jae Sung
    NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2020, 54 (06) : 299 - 304
  • [5] Self-supervised PET Denoising
    Si Young Yie
    Seung Kwan Kang
    Donghwi Hwang
    Jae Sung Lee
    Nuclear Medicine and Molecular Imaging, 2020, 54 : 299 - 304
  • [6] Deep Self-Supervised Learning of Speech Denoising from Noisy Speeches
    Sanada, Yutaro
    Nakagawa, Takumi
    Wada, Yuichiro
    Takanashi, Kosaku
    Zhang, Yuhui
    Tokuyama, Kiichi
    Kanamori, Takafumi
    Yamada, Tomonori
    INTERSPEECH 2022, 2022, : 1178 - 1182
  • [7] Self-supervised speech denoising using only noisy audio signals
    Wu, Jiasong
    Li, Qingchun
    Yang, Guanyu
    Li, Lei
    Senhadji, Lotfi
    Shu, Huazhong
    SPEECH COMMUNICATION, 2023, 149 : 63 - 73
  • [8] Joint Self-Supervised Enhancement and Denoising of Low-Light Images
    Yu, Ting
    Wang, Shuai
    Chen, Wei
    Yu, F. Richard
    Leung, Victor C. M.
    Tian, Zijian
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (02): : 1800 - 1813
  • [9] Efficient Personalized Speech Enhancement Through Self-Supervised Learning
    Sivaraman, Aswin
    Kim, Minje
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1342 - 1356
  • [10] Self-supervised Bone Scan Denoising
    Yie, Si Young
    Kang, Seung Kwan
    Hwang, Donghwi
    Choi, Hongyoon
    Lee, Jae Sung
    JOURNAL OF NUCLEAR MEDICINE, 2021, 62