Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation

被引:21
|
作者
Zhang, Zhuohuang [1 ,2 ]
Xu, Yong [3 ]
Yu, Meng [3 ]
Zhang, Shi-Xiong [3 ]
Chen, Lianwu [4 ]
Williamson, Donald S. [1 ]
Yu, Dong [3 ]
机构
[1] Indiana Univ, Dept Comp Sci, Bloomington, IN 47408 USA
[2] Indiana Univ, Dept Speech Language & Hearing Sci, Bloomington, IN 47408 USA
[3] Tencent Al Lab, Bellevue, WA 98004 USA
[4] Tencent AI Lab, Shenzhen 518054, Peoples R China
关键词
Nonlinear distortion; Covariance matrices; Artificial neural networks; Array signal processing; Noise measurement; Feature extraction; Task analysis; Speech separation; deep learning; MVDR; ADL-MVDR; RECURRENT NEURAL-NETWORK; NOISE-REDUCTION; ENHANCEMENT; SINGLE; PERFORMANCE; MODEL; DEREVERBERATION; RECOGNITION; BEAMFORMER;
D O I
10.1109/TASLP.2021.3129335
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, however, conventional neural mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved in the MVDR solution is sometimes numerically unstable during joint training with neural networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech separation, which extends our preliminary multi-channel ADL-MVDR approach. The proposed MCMF ADL-MVDR system addresses linear and nonlinear distortions. Spatio-temporal cross correlations are also fully utilized in the proposed approach. The proposed systems are evaluated using a Mandarin audio-visual corpus and are compared with several state-of-the-art approaches. Experimental results demonstrate the superiority of our proposed systems under different scenarios and across several objective evaluation metrics, including ASR performance.
引用
收藏
页码:3526 / 3540
页数:15
相关论文
共 50 条
  • [1] ADL-MVDR: ALL DEEP LEARNING MVDR BEAMFORMER FOR TARGET SPEECH SEPARATION
    Zhang, Zhuohuang
    Xu, Yong
    Yu, Meng
    Zhang, Shi-Xiong
    Chen, Lianwu
    Yu, Dong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6089 - 6093
  • [2] Reduced-Complexity Semi-Distributed Multi-Channel Multi-Frame MVDR Filter
    Ranjbaryan, Raziyeh
    Abutalebi, Hamid Reza
    Doclo, Simon
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2095 - 2099
  • [3] Multi-Modal Multi-Channel Target Speech Separation
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Xu, Yong
    Chen, Lianwu
    Zou, Yuexian
    Yu, Dong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 530 - 541
  • [4] DEEP MULTI-FRAME MVDR FILTERING FOR SINGLE-MICROPHONE SPEECH ENHANCEMENT
    Tammen, Marvin
    Doclo, Simon
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8443 - 8447
  • [5] Multi-channel Speech Enhancement Based on the MVDR Beamformer and Postfilter
    Wang, Dujuan
    Bao, Changchun
    2020 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2020), 2020,
  • [6] Factorized MVDR Deep Beamforming for Multi-Channel Speech Enhancement
    Kim, Hansol
    Kang, Kyeongmuk
    Shin, Jong Won
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1898 - 1902
  • [7] Sensitivity Analysis of the Multi-Frame MVDR Filter for Single-Microphone Speech Enhancement
    Fischer, Dorte
    Doclo, Simon
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 603 - 607
  • [8] SEQUENTIAL MULTI-FRAME NEURAL BEAMFORMING FOR SPEECH SEPARATION AND ENHANCEMENT
    Wang, Zhong-Qiu
    Erdogan, Hakan
    Wisdom, Scott
    Wilson, Kevin
    Raj, Desh
    Watanabe, Shinji
    Chen, Zhuo
    Hershey, John R.
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 905 - 911
  • [9] Iteratively Refined Multi-Channel Speech Separation
    Zhang, Xu
    Bao, Changchun
    Yang, Xue
    Zhou, Jing
    APPLIED SCIENCES-BASEL, 2024, 14 (14):
  • [10] Three-Stage Multi-Frame Multi-Channel In-Loop Filter of VVC
    Li, Si
    Qi, Honggang
    Zhang, Yundong
    Cui, Guoqin
    ELECTRONICS, 2025, 14 (05):