Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement

被引:4
|
作者
Kim, Hansol [1 ]
Kang, Kyeongmuk [1 ]
Shin, Jong Won [1 ]
机构
[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 61005, South Korea
基金
新加坡国家研究基金会;
关键词
Speech enhancement; Estimation; Artificial neural networks; MISO communication; Array signal processing; Deep learning; Microphone arrays; Multi-channel speech enhancement; deep learning-based beamforming; factorized MVDR beamformer; NEURAL-NETWORK; SEPARATION; ATTENTION;
D O I
10.1109/LSP.2022.3200581
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Traditionally, adaptive beamformers such as the minimum-variance distortionless response (MVDR) beamformer and generalized eigenvalue beamformer have been widely used for multi-channel speech enhancement with a single-channel postfilter. Recently, several approaches have been proposed to enhance the signals used to estimate speech and noise spatial covariance matrices (SCMs) and process the outputs of the beamformers using deep neural networks (DNNs). However, the preprocessing of the signals for SCMs estimation may disrupt phase relations among input signals and the time-averages used to estimate speech and noise SCMs may not be optimal for beamformer performance even though the estimated signals are close to the ground truth. In this letter, we propose a deep beamforming approach which estimates factors of the MVDR beamformer using a DNN to circumvent the difficulty of the speech and noise SCM estimation. We formulate the MVDR beamformer as a factorized form related to two complex factors and estimate them using a DNN with a cost function comparing beamformed signal and the original clean speech. Experimental results showed that the proposed factorized MVDR beamformer could mimic the characteristics of the MVDR beamformer with true relative transfer function and noise SCM and outperformed the MVDR beamformer with deep learning-based pre- and post-processing in terms of the perceptual evaluation of speech quality scores.
引用
收藏
页码:1898 / 1902
页数:5
相关论文
共 50 条
  • [21] Speech Enhancement Integrating the MVDR Beamforming and T-F Masking
    Zhu, Jinru
    Bao, Changchun
    Cheng, Rui
    CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,
  • [22] MULTI-CHANNEL SPEECH ENHANCEMENT USING GRAPH NEURAL NETWORKS
    Tzirakis, Panagiotis
    Kumar, Anurag
    Donley, Jacob
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3415 - 3419
  • [23] A separation and interaction framework for causal multi-channel speech enhancement
    Liu, Wenzhe
    Li, Andong
    Zheng, Chengshi
    Li, Xiaodong
    DIGITAL SIGNAL PROCESSING, 2022, 126
  • [24] Multi-channel Speech Enhancement with Multiple-target GANs
    Yuan, Jing
    Bao, Changchun
    2020 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2020), 2020,
  • [25] MULTI-CHANNEL SPEECH ENHANCEMENT BASED ON INDEPENDENT VECTOR EXTRACTION
    Cmejla, Jaroslav
    Koldovsky, Zbynek
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 525 - 529
  • [26] DEEP MULTI-FRAME MVDR FILTERING FOR SINGLE-MICROPHONE SPEECH ENHANCEMENT
    Tammen, Marvin
    Doclo, Simon
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8443 - 8447
  • [27] Eigenvector-Based Speech Mask Estimation for Multi-Channel Speech Enhancement
    Pfeifenberger, Lukas
    Zoehrer, Matthias
    Pernkopf, Franz
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2162 - 2172
  • [28] A generic neural acoustic beamforming architecture for robust multi-channel speech processing
    Heymann, Jahn
    Drude, Lukas
    Haeb-Umbach, Reinhold
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 374 - 385
  • [29] A Causal U-net based Neural Beamforming Network for Real-Time Multi-Channel Speech Enhancement
    Ren, Xinlei
    Zhang, Xu
    Chen, Lianwu
    Zheng, Xiguang
    Zhang, Chen
    Guo, Liang
    Yu, Bing
    INTERSPEECH 2021, 2021, : 1832 - 1836
  • [30] Multi-objective based multi-channel speech enhancement with BiLSTM network
    Cui, Xingyue
    Chen, Zhe
    Yin, Fuliang
    APPLIED ACOUSTICS, 2021, 177