A Feature Integration Network for Multi-Channel Speech Enhancement

被引:0
|
作者
Zeng, Xiao [1 ]
Zhang, Xue [1 ]
Wang, Mingjiang [1 ]
机构
[1] Harbin Inst Technol, Key Lab Key Technol IoT Terminals, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-channel speech enhancement; LSTM; deep learning; self-attention;
D O I
10.3390/s24227344
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Multi-channel speech enhancement has become an active area of research, demonstrating excellent performance in recovering desired speech signals from noisy environments. Recent approaches have increasingly focused on leveraging spectral information from multi-channel inputs, yielding promising results. In this study, we propose a novel feature integration network that not only captures spectral information but also refines it through shifted-window-based self-attention, enhancing the quality and precision of the feature extraction. Our network consists of blocks containing a full- and sub-band LSTM module for capturing spectral information, and a global-local attention fusion module for refining this information. The full- and sub-band LSTM module integrates both full-band and sub-band information through two LSTM layers, while the global-local attention fusion module learns global and local attention in a dual-branch architecture. To further enhance the feature integration, we fuse the outputs of these branches using a spatial attention module. The model is trained to predict the complex ratio mask (CRM), thereby improving the quality of the enhanced signal. We conducted an ablation study to assess the contribution of each module, with each showing a significant impact on performance. Additionally, our model was trained on the SPA-DNS dataset using a circular microphone array and the Libri-wham dataset with a linear microphone array, achieving competitive results compared to state-of-the-art models.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A time-frequency fusion model for multi-channel speech enhancement
    Zeng, Xiao
    Xu, Shiyun
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [32] TENSOR-TO-VECTOR REGRESSION FOR MULTI-CHANNEL SPEECH ENHANCEMENT BASED ON TENSOR-TRAIN NETWORK
    Qi, Jun
    Hu, Hu
    Wang, Yannan
    Yang, Chao-Han Huck
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7504 - 7508
  • [33] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
    Taherian, Hassan
    Wang, Zhong-Qiu
    Chang, Jorge
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302
  • [34] INCORPORATING MULTI-CHANNEL WIENER FILTER WITH SINGLE-CHANNEL SPEECH ENHANCEMENT ALGORITHM
    Yong, Pei Chee
    Nordholm, Sven
    Dam, Hai Huyen
    Leung, Yee Hong
    Lai, Chiong Ching
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7284 - 7288
  • [35] A unified network for multi-speaker speech recognition with multi-channel recordings
    Liu, Conggui
    Inoue, Nakamasa
    Shinoda, Koichi
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1304 - 1307
  • [36] MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
    Chen, Lianwu
    Yu, Meng
    Su, Dan
    Yu, Dong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 705 - 709
  • [37] A Multi-Scale Feature Recalibration Network for End-to-End Single Channel Speech Enhancement
    Xian, Yang
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (01) : 143 - 155
  • [38] Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition
    Moritz, Niko
    Adiloglu, Kamil
    Anemueller, Joern
    Goetze, Stefan
    Kollmeier, Birger
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 558 - 573
  • [39] BP-CRN: A Lightweight Two-Stage Convolutional Recurrent Network for Multi-Channel Speech Enhancement
    Pang, Cong
    Ni, Ye
    Cheng, Jiaming
    Zhou, Lin
    Zhao, Li
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2025, E108D (02) : 161 - 164
  • [40] Multi-Channel Speech Enhancement using a Minimum Variance Distortionless Response Beamformer based on Graph Convolutional Network
    Nguyen Huu Binh
    Duong Van Hai
    Bui Tien Dat
    Hoang Ngoc Chau
    Nguyen Quoc Cuong
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (10) : 739 - 747