Deep Learning-Based Coding Strategy for Improved Cochlear Implant Speech Perception in Noisy Environments

被引:0
|
作者
Essaid, Billel [1 ]
Kheddar, Hamza [1 ]
Batel, Noureddine [1 ]
Chowdhury, Muhammad E. H. [2 ]
机构
[1] Univ MEDEA, Elect Engn Dept, LSEA Lab, Medea 26000, Algeria
[2] Qatar Univ, Dept Elect Engn, Doha, Qatar
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Speech enhancement; Noise measurement; Noise reduction; Noise; Convolutional neural networks; Autoencoders; Biological system modeling; Training; Real-time systems; Feature extraction; Cochlear implant; deep learning; sound coding strategy; speech enhancement; transformer; NEURAL-NETWORK; DENOISING AUTOENCODER; INTELLIGIBILITY; ENHANCEMENT;
D O I
10.1109/ACCESS.2025.3542953
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) and speech enhancement are essential tools in modern life, aiding not only in machine interaction but also in supporting individuals with hearing impairments. These processes begin with capturing speech in analog form and applying signal processing algorithms to ensure compatibility with devices like cochlear implants (CIs). However, CIs, with their limited number of electrodes, often cause speech distortion, and despite advancements in state-of-the-art signal processing techniques, challenges persist, particularly in noisy environments with multiple speech sources. The rise of artificial intelligence (AI) has introduced innovative strategies to address these limitations. This paper presents a novel deep learning (DL)-based technique that leverages attention mechanisms to improve speech intelligibility through noise suppression. The proposed approach includes two strategies: the first integrates temporal convolutional networks (TCNs) and multi-head attention (MHA) layers to capture both local and global dependencies within the speech signal, enabling precise noise filtering and improved clarity. The second strategy builds on this framework by additionally incorporating bidirectional gated recurrent units (Bi-GRU) alongside TCN and MHA layers, further refining sequence modeling and enhancing noise reduction. The optimal model configuration, using TCN-MHA-Bi-GRU with a kernel size of 16, achieved a compact model size of 788K parameters and recorded training, and validation losses of 0.0350 and 0.0446, respectively. Experimental results on the TIMIT and Harvard Sentences datasets, enriched with diverse noise sources from the DEMAND database, yielded high intelligibility scores with a short-time objective intelligibility (STOI) of 0.8345, word recognition score (WRS) of 99.2636, and an near correlation coefficient (LCC) of 0.9607, underscoring the model's capability to enhance speech perception in noisy CI environments, ensuring a balance between model size and speech quality, and surpassing the existing state-of-the-art techniques.
引用
收藏
页码:35707 / 35732
页数:26
相关论文
共 50 条
  • [41] A Deep Learning based Noise Reduction Approach to Improve Speech Intelligibility for Cochlear Implant Recipients in the Presence of Competing Speech Noise
    Wang, Syu-Siang
    Tsao, Yu
    Wang, Hsiao-Lan Sharon
    Lai, Ying-Hui
    Li, Lieber Po-Hung
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 808 - 812
  • [42] Interleaved Acoustic Environments: Impact of an Auditory Scene Classification Procedure on Speech Perception in Cochlear Implant Users
    Eichenauer, Anja
    Baumann, Uwe
    Stoever, Timo
    Weissgerber, Tobias
    TRENDS IN HEARING, 2021, 25
  • [43] Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants
    Nie, K. (niek@u.washington.edu), 1600, Acoustical Society of America (132):
  • [44] Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants
    Li, Xing
    Nie, Kaibao
    Imennov, Nikita S.
    Won, Jong Ho
    Drennan, Ward R.
    Rubinstein, Jay T.
    Atlas, Les E.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (05): : 3387 - 3398
  • [45] Deep-PANTHER: Learning-Based Perception-Aware Trajectory Planner in Dynamic Environments
    Tordesillas, Jesus
    How, Jonathan P.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (03) : 1399 - 1406
  • [46] A LEARNING-BASED APPROACH TO DIRECTION OF ARRIVAL ESTIMATION IN NOISY AND REVERBERANT ENVIRONMENTS
    Xiao, Xiong
    Zhao, Shengkui
    Zhong, Xionghu
    Jones, Douglas L.
    Chng, Eng Siong
    Li, Haizhou
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2814 - 2818
  • [47] Deep Learning-based Telephony Speech Recognition in the Wild
    Han, Kyu J.
    Hahm, Seongjun
    Kim, Byung-Hak
    Kim, Jungsuk
    Lane, Ian
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1323 - 1327
  • [48] Target exaggeration for deep learning-based speech enhancement
    Kim, Hansol
    Shin, Jong Won
    DIGITAL SIGNAL PROCESSING, 2021, 116
  • [49] Deep Learning-Based Amplitude Fusion for Speech Dereverberation
    Liu, Chunlei
    Wang, Longbiao
    Dang, Jianwu
    DISCRETE DYNAMICS IN NATURE AND SOCIETY, 2020, 2020
  • [50] Home-Based Speech Perception Monitoring for Clinical Use With Cochlear Implant Users
    van Wieringen, Astrid
    Magits, Sara
    Francart, Tom
    Wouters, Jan
    FRONTIERS IN NEUROSCIENCE, 2021, 15