Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation

被引:21
|
作者
Zhang, Zhuohuang [1 ,2 ]
Xu, Yong [3 ]
Yu, Meng [3 ]
Zhang, Shi-Xiong [3 ]
Chen, Lianwu [4 ]
Williamson, Donald S. [1 ]
Yu, Dong [3 ]
机构
[1] Indiana Univ, Dept Comp Sci, Bloomington, IN 47408 USA
[2] Indiana Univ, Dept Speech Language & Hearing Sci, Bloomington, IN 47408 USA
[3] Tencent Al Lab, Bellevue, WA 98004 USA
[4] Tencent AI Lab, Shenzhen 518054, Peoples R China
关键词
Nonlinear distortion; Covariance matrices; Artificial neural networks; Array signal processing; Noise measurement; Feature extraction; Task analysis; Speech separation; deep learning; MVDR; ADL-MVDR; RECURRENT NEURAL-NETWORK; NOISE-REDUCTION; ENHANCEMENT; SINGLE; PERFORMANCE; MODEL; DEREVERBERATION; RECOGNITION; BEAMFORMER;
D O I
10.1109/TASLP.2021.3129335
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, however, conventional neural mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved in the MVDR solution is sometimes numerically unstable during joint training with neural networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech separation, which extends our preliminary multi-channel ADL-MVDR approach. The proposed MCMF ADL-MVDR system addresses linear and nonlinear distortions. Spatio-temporal cross correlations are also fully utilized in the proposed approach. The proposed systems are evaluated using a Mandarin audio-visual corpus and are compared with several state-of-the-art approaches. Experimental results demonstrate the superiority of our proposed systems under different scenarios and across several objective evaluation metrics, including ASR performance.
引用
收藏
页码:3526 / 3540
页数:15
相关论文
共 50 条
  • [21] SUBSPACE-BASED SPEECH CORRELATION VECTOR ESTIMATION FOR SINGLE-MICROPHONE MULTI-FRAME MVDR FILTERING
    Fischer, Dorte
    Doclo, Simon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 856 - 860
  • [22] Complex Neural Spatial Filter: Enhancing Multi-Channel Target Speech Separation in Complex Domain
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Zou, Yuexian
    Yu, Dong
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1370 - 1374
  • [23] Multi-Channel Speech Separation with Cross-Attention and Beamforming
    Mosner, Ladislav
    Plchot, Oldrich
    Peng, Junyi
    Burget, Lukas
    Cernocky, Jan Honza
    INTERSPEECH 2023, 2023, : 1693 - 1697
  • [24] A separation and interaction framework for causal multi-channel speech enhancement
    Liu, Wenzhe
    Li, Andong
    Zheng, Chengshi
    Li, Xiaodong
    DIGITAL SIGNAL PROCESSING, 2022, 126
  • [25] Frequency domain multi-channel speech separation and its applications
    Handa, M
    Nagai, T
    Kurematsu, A
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 2761 - 2764
  • [26] Multi-channel Speech Enhancement with Multiple-target GANs
    Yuan, Jing
    Bao, Changchun
    2020 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2020), 2020,
  • [27] MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
    Chen, Lianwu
    Yu, Meng
    Su, Dan
    Yu, Dong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 705 - 709
  • [28] Improvement of Spatial Ambiguity in Multi-Channel Speech Separation Using Channel Attention
    Hong, Qian-Bei
    Wu, Chung-Hsien
    Thanh Binh Nguyen
    Wang, Hsin-Min
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 619 - 623
  • [29] Two-Stage Single-Channel Speech Enhancement with Multi-Frame Filtering
    Lin, Shaoxiong
    Zhang, Wangyou
    Qian, Yanmin
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [30] Multi-channel signal separation
    Chan, DCB
    Rayner, PJW
    Godsill, SJ
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 649 - 652