Deep Learning-Based Coding Strategy for Improved Cochlear Implant Speech Perception in Noisy Environments

被引:0
|
作者
Essaid, Billel [1 ]
Kheddar, Hamza [1 ]
Batel, Noureddine [1 ]
Chowdhury, Muhammad E. H. [2 ]
机构
[1] Univ MEDEA, Elect Engn Dept, LSEA Lab, Medea 26000, Algeria
[2] Qatar Univ, Dept Elect Engn, Doha, Qatar
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Speech enhancement; Noise measurement; Noise reduction; Noise; Convolutional neural networks; Autoencoders; Biological system modeling; Training; Real-time systems; Feature extraction; Cochlear implant; deep learning; sound coding strategy; speech enhancement; transformer; NEURAL-NETWORK; DENOISING AUTOENCODER; INTELLIGIBILITY; ENHANCEMENT;
D O I
10.1109/ACCESS.2025.3542953
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) and speech enhancement are essential tools in modern life, aiding not only in machine interaction but also in supporting individuals with hearing impairments. These processes begin with capturing speech in analog form and applying signal processing algorithms to ensure compatibility with devices like cochlear implants (CIs). However, CIs, with their limited number of electrodes, often cause speech distortion, and despite advancements in state-of-the-art signal processing techniques, challenges persist, particularly in noisy environments with multiple speech sources. The rise of artificial intelligence (AI) has introduced innovative strategies to address these limitations. This paper presents a novel deep learning (DL)-based technique that leverages attention mechanisms to improve speech intelligibility through noise suppression. The proposed approach includes two strategies: the first integrates temporal convolutional networks (TCNs) and multi-head attention (MHA) layers to capture both local and global dependencies within the speech signal, enabling precise noise filtering and improved clarity. The second strategy builds on this framework by additionally incorporating bidirectional gated recurrent units (Bi-GRU) alongside TCN and MHA layers, further refining sequence modeling and enhancing noise reduction. The optimal model configuration, using TCN-MHA-Bi-GRU with a kernel size of 16, achieved a compact model size of 788K parameters and recorded training, and validation losses of 0.0350 and 0.0446, respectively. Experimental results on the TIMIT and Harvard Sentences datasets, enriched with diverse noise sources from the DEMAND database, yielded high intelligibility scores with a short-time objective intelligibility (STOI) of 0.8345, word recognition score (WRS) of 99.2636, and an near correlation coefficient (LCC) of 0.9607, underscoring the model's capability to enhance speech perception in noisy CI environments, ensuring a balance between model size and speech quality, and surpassing the existing state-of-the-art techniques.
引用
收藏
页码:35707 / 35732
页数:26
相关论文
共 50 条
  • [31] Deep Learning-Based Computer Vision Methods for Complex Traffic Environments Perception: A Review
    Talha Azfar
    Jinlong Li
    Hongkai Yu
    Ruey L. Cheu
    Yisheng Lv
    Ruimin Ke
    Data Science for Transportation, 2024, 6 (1):
  • [32] Evaluation of the high-resolution speech coding strategy for the Clarion CII cochlear implant system
    Ostroff, JM
    David, EA
    Shipp, DB
    Chen, JM
    Nedzelski, JM
    JOURNAL OF OTOLARYNGOLOGY, 2003, 32 (02): : 81 - 86
  • [33] On the Robustness of Deep Learning-Based Speech Enhancement
    Chhetri, Amit S.
    Hilmes, Philip
    Athi, Mrudula
    Shankar, Nikhil
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1587 - 1594
  • [34] Improving the Performance of Speech Perception in Noisy Environment Based on an FAME Strategy
    Lai, Yin-Hui
    Wang, Syu-Siang
    Su, Yu-Ting
    Han-Che, Cheng
    Fu, Fan Kang
    Tsao, Yu
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [35] Speech understanding and listening effort in cochlear implant users - microphone beamformers lead to significant improvements in noisy environments
    Buechner, Andreas
    Schwebs, Manfred
    Lenarz, Thomas
    COCHLEAR IMPLANTS INTERNATIONAL, 2020, 21 (01) : 1 - 8
  • [36] A deep learning-based binocular perception system
    SUN Zhao
    MA Chao
    WANG Liang
    MENG Ran
    PEI Shanshan
    Journal of Systems Engineering and Electronics, 2021, 32 (01) : 7 - 20
  • [37] A deep learning-based binocular perception system
    Sun Zhao
    Ma Chao
    Wang Liang
    Meng Ran
    Pei Shanshan
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2021, 32 (01) : 7 - 20
  • [38] The effects of processor strategy on the speech perception performance of pediatric nucleus multichannel cochlear implant users
    Sehgal, ST
    Kirk, KI
    Svirsky, M
    Miyamoto, RT
    EAR AND HEARING, 1998, 19 (02): : 149 - 161
  • [39] Improved Speech Perception in Cochlear Implant Users With Interleaved High-Rate Pulse Trains
    Runge, Christina L.
    Du, Fang
    Hu, Yi
    OTOLOGY & NEUROTOLOGY, 2018, 39 (05) : E319 - E324
  • [40] Speech perception performance in experienced cochlear-implant patients receiving the SPEAK processing strategy in the Nucleus Spectra-22 cochlear implant
    Parkinson, AJ
    Parkinson, WS
    Tyler, RS
    Lowder, MW
    Gantz, BJ
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 1998, 41 (05): : 1073 - 1087