A Feature Integration Network for Multi-Channel Speech Enhancement

被引:0
|
作者
Zeng, Xiao [1 ]
Zhang, Xue [1 ]
Wang, Mingjiang [1 ]
机构
[1] Harbin Inst Technol, Key Lab Key Technol IoT Terminals, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-channel speech enhancement; LSTM; deep learning; self-attention;
D O I
10.3390/s24227344
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Multi-channel speech enhancement has become an active area of research, demonstrating excellent performance in recovering desired speech signals from noisy environments. Recent approaches have increasingly focused on leveraging spectral information from multi-channel inputs, yielding promising results. In this study, we propose a novel feature integration network that not only captures spectral information but also refines it through shifted-window-based self-attention, enhancing the quality and precision of the feature extraction. Our network consists of blocks containing a full- and sub-band LSTM module for capturing spectral information, and a global-local attention fusion module for refining this information. The full- and sub-band LSTM module integrates both full-band and sub-band information through two LSTM layers, while the global-local attention fusion module learns global and local attention in a dual-branch architecture. To further enhance the feature integration, we fuse the outputs of these branches using a spatial attention module. The model is trained to predict the complex ratio mask (CRM), thereby improving the quality of the enhanced signal. We conducted an ablation study to assess the contribution of each module, with each showing a significant impact on performance. Additionally, our model was trained on the SPA-DNS dataset using a circular microphone array and the Libri-wham dataset with a linear microphone array, achieving competitive results compared to state-of-the-art models.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] MULTI-CHANNEL SPEECH ENHANCEMENT BASED ON INDEPENDENT VECTOR EXTRACTION
    Cmejla, Jaroslav
    Koldovsky, Zbynek
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 525 - 529
  • [22] Eigenvector-Based Speech Mask Estimation for Multi-Channel Speech Enhancement
    Pfeifenberger, Lukas
    Zoehrer, Matthias
    Pernkopf, Franz
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2162 - 2172
  • [23] Correntropy-Based Multi-objective Multi-channel Speech Enhancement
    Cui, Xingyue
    Chen, Zhe
    Yin, Fuliang
    Xu, Xianfa
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (09) : 4998 - 5025
  • [24] Correntropy-Based Multi-objective Multi-channel Speech Enhancement
    Xingyue Cui
    Zhe Chen
    Fuliang Yin
    Xianfa Xu
    Circuits, Systems, and Signal Processing, 2022, 41 : 4998 - 5025
  • [25] MULTI-CHANNEL OVERLAPPED SPEECH RECOGNITION WITH LOCATION GUIDED SPEECH EXTRACTION NETWORK
    Chen, Zhuo
    Xiao, Xiong
    Yoshioka, Takuya
    Erdogan, Hakan
    Li, Jinyu
    Gong, Yifan
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 558 - 565
  • [26] Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech
    Yu, Jianwei
    Zhang, Shi-Xiong
    Wu, Bo
    Liu, Shansong
    Hu, Shoukang
    Geng, Mengzhe
    Liu, Xunying
    Meng, Helen
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2067 - 2082
  • [27] Speech Enhancement Algorithm Based on Microphone Array and Multi-Channel Parallel GRU-CNN Network
    Xi, Ji
    Xu, Zhe
    Zhang, Weiqi
    Xie, Yue
    Zhao, Li
    ELECTRONICS, 2025, 14 (04):
  • [28] Speech enhancement by multi-channel crosstalk resistant adaptive noise cancellation
    Zeng, Qingning
    Abdulla, Waleed H.
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 485 - 488
  • [29] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
    Xue, Cheng
    Huang, Weilong
    Chen, Weiguang
    Feng, Jinwei
    INTERSPEECH 2021, 2021, : 1862 - 1866
  • [30] Construction of microphone arrays for the optimization of multi-channel speech enhancement systems
    Drews, M
    FREQUENZ, 1996, 50 (9-10) : 223 - 227