Deep Learning-Based Coding Strategy for Improved Cochlear Implant Speech Perception in Noisy Environments

被引:0
|
作者
Essaid, Billel [1 ]
Kheddar, Hamza [1 ]
Batel, Noureddine [1 ]
Chowdhury, Muhammad E. H. [2 ]
机构
[1] Univ MEDEA, Elect Engn Dept, LSEA Lab, Medea 26000, Algeria
[2] Qatar Univ, Dept Elect Engn, Doha, Qatar
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Speech enhancement; Noise measurement; Noise reduction; Noise; Convolutional neural networks; Autoencoders; Biological system modeling; Training; Real-time systems; Feature extraction; Cochlear implant; deep learning; sound coding strategy; speech enhancement; transformer; NEURAL-NETWORK; DENOISING AUTOENCODER; INTELLIGIBILITY; ENHANCEMENT;
D O I
10.1109/ACCESS.2025.3542953
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic speech recognition (ASR) and speech enhancement are essential tools in modern life, aiding not only in machine interaction but also in supporting individuals with hearing impairments. These processes begin with capturing speech in analog form and applying signal processing algorithms to ensure compatibility with devices like cochlear implants (CIs). However, CIs, with their limited number of electrodes, often cause speech distortion, and despite advancements in state-of-the-art signal processing techniques, challenges persist, particularly in noisy environments with multiple speech sources. The rise of artificial intelligence (AI) has introduced innovative strategies to address these limitations. This paper presents a novel deep learning (DL)-based technique that leverages attention mechanisms to improve speech intelligibility through noise suppression. The proposed approach includes two strategies: the first integrates temporal convolutional networks (TCNs) and multi-head attention (MHA) layers to capture both local and global dependencies within the speech signal, enabling precise noise filtering and improved clarity. The second strategy builds on this framework by additionally incorporating bidirectional gated recurrent units (Bi-GRU) alongside TCN and MHA layers, further refining sequence modeling and enhancing noise reduction. The optimal model configuration, using TCN-MHA-Bi-GRU with a kernel size of 16, achieved a compact model size of 788K parameters and recorded training, and validation losses of 0.0350 and 0.0446, respectively. Experimental results on the TIMIT and Harvard Sentences datasets, enriched with diverse noise sources from the DEMAND database, yielded high intelligibility scores with a short-time objective intelligibility (STOI) of 0.8345, word recognition score (WRS) of 99.2636, and an near correlation coefficient (LCC) of 0.9607, underscoring the model's capability to enhance speech perception in noisy CI environments, ensuring a balance between model size and speech quality, and surpassing the existing state-of-the-art techniques.
引用
收藏
页码:35707 / 35732
页数:26
相关论文
共 50 条
  • [21] DNoiseNet: Deep learning-based feedback active noise control in various noisy environments
    Cha, Young-Jin
    Mostafavi, Alireza
    Benipal, Sukhpreet S.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 121
  • [22] A cochlear implant speech processing strategy based on an auditory model
    Grayden, DB
    Burkitt, AN
    Kenny, OP
    Clarey, JC
    Paolini, AG
    Clark, GM
    PROCEEDINGS OF THE 2004 INTELLIGENT SENSORS, SENSOR NETWORKS & INFORMATION PROCESSING CONFERENCE, 2004, : 491 - 496
  • [23] A Speech Processing Strategy based on Sinusoidal Speech Model for Cochlear Implant Users
    Lee, Sungmin
    Akbarzadeh, Sara
    Singh, Satnam
    Tuan-Tan, Chin
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 393 - 397
  • [24] ElectrodeNet-A Deep-Learning-Based Sound Coding Strategy for Cochlear Implants
    Huang, Enoch Hsin-Ho
    Chao, Rong
    Tsao, Yu
    Wu, Chao-Min
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (01) : 346 - 357
  • [25] Speech perception results in adults implanted with the CLARION® Multi-Strategy™ cochlear implant
    Osberger, MJ
    Fisher, L
    Kalberer, A
    UPDATES IN COCHLEAR IMPLANTATION, 2000, 57 : 421 - 424
  • [26] Speech perception results in children using the Clarion® Multi-Strategy™ Cochlear Implant
    Osberger, MJ
    Kalberer, A
    Zimmerman-Phillips, S
    Barker, MJ
    Geier, L
    ANNALS OF OTOLOGY RHINOLOGY AND LARYNGOLOGY, 2000, 109 (12): : 75 - 77
  • [27] Speech perception results in children implanted with the CLARION® Multi-Strategy™ cochlear implant
    Osberger, MJ
    Fisher, L
    Kalberer, A
    UPDATES IN COCHLEAR IMPLANTATION, 2000, 57 : 417 - 420
  • [28] Deep Learning-Based Speech Enhancement With a Loss Trading Off the Speech Distortion and the Noise Residue for Cochlear Implants
    Kang, Yuyong
    Zheng, Nengheng
    Meng, Qinglin
    FRONTIERS IN MEDICINE, 2021, 8
  • [29] Effect of deep insertion of the cochlear implant electrode array on pitch estimation and speech perception
    Hamzavi, Jafar
    Arnoldner, Christoph
    ACTA OTO-LARYNGOLOGICA, 2006, 126 (11) : 1182 - 1187
  • [30] SPEECH ENHANCEMENT BASED ON NEURAL NETWORKS APPLIED TO COCHLEAR IMPLANT CODING STRATEGIES
    Bolner, Federico
    Goehring, Tobias
    Monaghan, Jessica
    van Dijk, Bas
    Wouters, Jan
    Bleeck, Stefan
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6520 - 6524