Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement

被引:4
|
作者
Kim, Hansol [1 ]
Kang, Kyeongmuk [1 ]
Shin, Jong Won [1 ]
机构
[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
基金
新加坡国家研究基金会;
关键词
Speech enhancement; Estimation; Artificial neural networks; MISO communication; Array signal processing; Deep learning; Microphone arrays; Multi-channel speech enhancement; deep learning-based beamforming; factorized MVDR beamformer; NEURAL-NETWORK; SEPARATION; ATTENTION;
D O I
10.1109/LSP.2022.3200581
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Traditionally, adaptive beamformers such as the minimum-variance distortionless response (MVDR) beamformer and generalized eigenvalue beamformer have been widely used for multi-channel speech enhancement with a single-channel postfilter. Recently, several approaches have been proposed to enhance the signals used to estimate speech and noise spatial covariance matrices (SCMs) and process the outputs of the beamformers using deep neural networks (DNNs). However, the preprocessing of the signals for SCMs estimation may disrupt phase relations among input signals and the time-averages used to estimate speech and noise SCMs may not be optimal for beamformer performance even though the estimated signals are close to the ground truth. In this letter, we propose a deep beamforming approach which estimates factors of the MVDR beamformer using a DNN to circumvent the difficulty of the speech and noise SCM estimation. We formulate the MVDR beamformer as a factorized form related to two complex factors and estimate them using a DNN with a cost function comparing beamformed signal and the original clean speech. Experimental results showed that the proposed factorized MVDR beamformer could mimic the characteristics of the MVDR beamformer with true relative transfer function and noise SCM and outperformed the MVDR beamformer with deep learning-based pre- and post-processing in terms of the perceptual evaluation of speech quality scores.
引用
收藏
页码:1898 / 1902
页数:5
相关论文
共 50 条
  • [31] Correntropy-Based Multi-objective Multi-channel Speech Enhancement
    Cui, Xingyue
    Chen, Zhe
    Yin, Fuliang
    Xu, Xianfa
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (09) : 4998 - 5025
  • [32] Correntropy-Based Multi-objective Multi-channel Speech Enhancement
    Xingyue Cui
    Zhe Chen
    Fuliang Yin
    Xianfa Xu
    Circuits, Systems, and Signal Processing, 2022, 41 : 4998 - 5025
  • [33] Speech enhancement by multi-channel crosstalk resistant adaptive noise cancellation
    Zeng, Qingning
    Abdulla, Waleed H.
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 485 - 488
  • [34] DESNET: A MULTI-CHANNEL NETWORK FOR SIMULTANEOUS SPEECH DEREVERBERATION, ENHANCEMENT AND SEPARATION
    Fu, Yihui
    Wu, Jian
    Hu, Yanxin
    Xing, Mengtao
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 857 - 864
  • [35] Construction of microphone arrays for the optimization of multi-channel speech enhancement systems
    Drews, M
    FREQUENZ, 1996, 50 (9-10) : 223 - 227
  • [36] A time-frequency fusion model for multi-channel speech enhancement
    Zeng, Xiao
    Xu, Shiyun
    Wang, Mingjiang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [37] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
    Taherian, Hassan
    Wang, Zhong-Qiu
    Chang, Jorge
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302
  • [38] Combined Multi-channel NMF-based Robust Beamforming for Noisy Speech Recognition
    Mimura, Masato
    Bando, Yoshiaki
    Shimada, Kazuki
    Sakai, Shinsuke
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2451 - 2455
  • [39] INCORPORATING MULTI-CHANNEL WIENER FILTER WITH SINGLE-CHANNEL SPEECH ENHANCEMENT ALGORITHM
    Yong, Pei Chee
    Nordholm, Sven
    Dam, Hai Huyen
    Leung, Yee Hong
    Lai, Chiong Ching
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7284 - 7288
  • [40] MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION USING DEEP COMPLEX UNET
    Kong, Yuxiang
    Wu, Jian
    Wang, Quandong
    Gao, Peng
    Zhuang, Weiji
    Wang, Yujun
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 104 - 110