A Feature Integration Network for Multi-Channel Speech Enhancement

被引:0
|
作者
Zeng, Xiao [1 ]
Zhang, Xue [1 ]
Wang, Mingjiang [1 ]
机构
[1] Harbin Inst Technol, Key Lab Key Technol IoT Terminals, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-channel speech enhancement; LSTM; deep learning; self-attention;
D O I
10.3390/s24227344
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Multi-channel speech enhancement has become an active area of research, demonstrating excellent performance in recovering desired speech signals from noisy environments. Recent approaches have increasingly focused on leveraging spectral information from multi-channel inputs, yielding promising results. In this study, we propose a novel feature integration network that not only captures spectral information but also refines it through shifted-window-based self-attention, enhancing the quality and precision of the feature extraction. Our network consists of blocks containing a full- and sub-band LSTM module for capturing spectral information, and a global-local attention fusion module for refining this information. The full- and sub-band LSTM module integrates both full-band and sub-band information through two LSTM layers, while the global-local attention fusion module learns global and local attention in a dual-branch architecture. To further enhance the feature integration, we fuse the outputs of these branches using a spatial attention module. The model is trained to predict the complex ratio mask (CRM), thereby improving the quality of the enhanced signal. We conducted an ablation study to assess the contribution of each module, with each showing a significant impact on performance. Additionally, our model was trained on the SPA-DNS dataset using a circular microphone array and the Libri-wham dataset with a linear microphone array, achieving competitive results compared to state-of-the-art models.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] ROBUST MULTI-CHANNEL SPEECH RECOGNITION USING FREQUENCY ALIGNED NETWORK
    Park, Taejin
    Kumatani, Kenichi
    Wu, Minhua
    Sundaram, Shiva
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6859 - 6863
  • [42] Attention-based multi-channel feature fusion enhancement network to process low-light images
    Xu, Xintao
    Li, Jinjiang
    Hua, Zhen
    Fan, Linwei
    IET IMAGE PROCESSING, 2022, 16 (12) : 3374 - 3393
  • [43] A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks
    Chau, Hoang Ngoc
    Bui, Tien Dat
    Nguyen, Huu Binh
    Duong, Thanh Thi Hien
    Nguyen, Quoc Cuong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1133 - 1144
  • [44] A multi-channel subband generalized singular value decomposition approach to speech enhancement
    Spriet, A
    Moonen, M
    Wouters, J
    EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 2002, 13 (02): : 149 - 158
  • [45] EXPLORING MULTI-CHANNEL FEATURES FOR DENOISING-AUTOENCODER-BASED SPEECH ENHANCEMENT
    Araki, Shoko
    Hayashi, Tomoki
    Delcroix, Marc
    Fujimoto, Masakiyo
    Takeda, Kazuya
    Nakatani, Tomohiro
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 116 - 120
  • [46] Three-stage hybrid neural beamformer for multi-channel speech enhancement
    Kuang, Kelan
    Yang, Feiran
    Li, Junfeng
    Yang, Jun
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (06): : 3378 - 3389
  • [47] A dynamic multi-channel speech enhancement system for distributed microphones in a car environment
    Timo Matheja
    Markus Buck
    Tim Fingscheidt
    EURASIP Journal on Advances in Signal Processing, 2013
  • [48] A dynamic multi-channel speech enhancement system for distributed microphones in a car environment
    Matheja, Timo
    Buck, Markus
    Fingscheidt, Tim
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2013,
  • [49] A SUPERVISED MULTI-CHANNEL SPEECH ENHANCEMENT ALGORITHM BASED ON BAYESIAN NMF MODEL
    Chung, Hanwook
    Plourde, Eric
    Champagne, Benoit
    2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 221 - 225
  • [50] Multi-channel Feature for Pedestrian Detection
    He, Zhixiang
    Xu, Meihua
    Guo, Aiying
    ADVANCED COMPUTATIONAL METHODS IN LIFE SYSTEM MODELING AND SIMULATION, LSMS 2017, PT I, 2017, 761 : 472 - 480