Deep Learning-Based Coding Strategy for Improved Cochlear Implant Speech Perception in Noisy Environments

被引:0
|
作者
Essaid, Billel [1 ]
Kheddar, Hamza [1 ]
Batel, Noureddine [1 ]
Chowdhury, Muhammad E. H. [2 ]
机构
[1] Univ MEDEA, Elect Engn Dept, LSEA Lab, Medea 26000, Algeria
[2] Qatar Univ, Dept Elect Engn, Doha, Qatar
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Speech enhancement; Noise measurement; Noise reduction; Noise; Convolutional neural networks; Autoencoders; Biological system modeling; Training; Real-time systems; Feature extraction; Cochlear implant; deep learning; sound coding strategy; speech enhancement; transformer; NEURAL-NETWORK; DENOISING AUTOENCODER; INTELLIGIBILITY; ENHANCEMENT;
D O I
10.1109/ACCESS.2025.3542953
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) and speech enhancement are essential tools in modern life, aiding not only in machine interaction but also in supporting individuals with hearing impairments. These processes begin with capturing speech in analog form and applying signal processing algorithms to ensure compatibility with devices like cochlear implants (CIs). However, CIs, with their limited number of electrodes, often cause speech distortion, and despite advancements in state-of-the-art signal processing techniques, challenges persist, particularly in noisy environments with multiple speech sources. The rise of artificial intelligence (AI) has introduced innovative strategies to address these limitations. This paper presents a novel deep learning (DL)-based technique that leverages attention mechanisms to improve speech intelligibility through noise suppression. The proposed approach includes two strategies: the first integrates temporal convolutional networks (TCNs) and multi-head attention (MHA) layers to capture both local and global dependencies within the speech signal, enabling precise noise filtering and improved clarity. The second strategy builds on this framework by additionally incorporating bidirectional gated recurrent units (Bi-GRU) alongside TCN and MHA layers, further refining sequence modeling and enhancing noise reduction. The optimal model configuration, using TCN-MHA-Bi-GRU with a kernel size of 16, achieved a compact model size of 788K parameters and recorded training, and validation losses of 0.0350 and 0.0446, respectively. Experimental results on the TIMIT and Harvard Sentences datasets, enriched with diverse noise sources from the DEMAND database, yielded high intelligibility scores with a short-time objective intelligibility (STOI) of 0.8345, word recognition score (WRS) of 99.2636, and an near correlation coefficient (LCC) of 0.9607, underscoring the model's capability to enhance speech perception in noisy CI environments, ensuring a balance between model size and speech quality, and surpassing the existing state-of-the-art techniques.
引用
收藏
页码:35707 / 35732
页数:26
相关论文
共 50 条
  • [1] Auditory Speech Perception Tests in Relation to the Coding Strategy in Cochlear Implant
    Bazon, Aline Cristine
    Mantello, Erika Barioni
    Goncales, Alina Sanches
    Isaac, Myriam de Lima
    Hyppolito, Miguel Angelo
    Mirandola Barbosa Reis, Ana Claudia
    INTERNATIONAL ARCHIVES OF OTORHINOLARYNGOLOGY, 2016, 20 (03) : 254 - 260
  • [2] A New Speech Coding Strategy for Cochlear Implant
    Wang, Wei-Dong
    Liu, Hong-Yun
    Yuan, Hu
    Ang, Qing
    JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2010, 30 (05) : 335 - 342
  • [3] Speech perception with F0mod, a cochlear implant pitch coding strategy
    Francart, Tom
    Osses, Alejandro
    Wouters, Jan
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2015, 54 (06) : 424 - 432
  • [4] Auditory characteristics-based speech coding strategy in cochlear implant
    Yang, Dan
    Xu, Bin
    Li, Feng
    Wang, Xu
    Dongbei Daxue Xuebao/Journal of Northeastern University, 2014, 35 (02): : 212 - 216
  • [5] Deep Learning-Based Noise Reduction Approach to Improve Speech Intelligibility for Cochlear Implant Recipients
    Lai, Ying-Hui
    Tsao, Yu
    Lu, Xugang
    Chen, Fei
    Su, Yu-Ting
    Chen, Kuang-Chao
    Chen, Yu-Hsuan
    Chen, Li-Ching
    Li, Lieber Po-Hung
    Lee, Chin-Hui
    EAR AND HEARING, 2018, 39 (04): : 795 - 809
  • [6] An Improved Speech Coding Strategy for Cochlear Implants
    Liu, Hongyun
    Wang, Weidong
    Liu, Guangrong
    Zhang, Zhengbo
    2010 3RD INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2010), VOLS 1-7, 2010, : 1416 - 1419
  • [7] Combining deep learning-based online beamforming with spectral subtraction for speech recognition in noisy environments
    Yoon, Sung-Wook
    Kwon, Oh-Wook
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 439 - 451
  • [8] A perception-based processing strategy for cochlear implants and speech coding
    Nie, K
    Zeng, FG
    PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4205 - 4208
  • [9] The effect of a coding strategy that removes temporally masked pulses on speech perception by cochlear implant users
    Lamping, Wiebke
    Goehring, Tobias
    Marozeau, Jeremy
    Carlyon, Robert P.
    HEARING RESEARCH, 2020, 391
  • [10] Improvement of Cochlear Implant Coding Strategy Based on Chinese Speech Boundary Information
    Wei, Mingfei
    Wang, Huiqin
    Fan, Qingyang
    Jiang, Li
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND COMPUTER AIDED EDUCATION (ICISCAE 2018), 2018, : 402 - 405