Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation

被引:21
|
作者
Zhang, Zhuohuang [1 ,2 ]
Xu, Yong [3 ]
Yu, Meng [3 ]
Zhang, Shi-Xiong [3 ]
Chen, Lianwu [4 ]
Williamson, Donald S. [1 ]
Yu, Dong [3 ]
机构
[1] Indiana Univ, Dept Comp Sci, Bloomington, IN 47408 USA
[2] Indiana Univ, Dept Speech Language & Hearing Sci, Bloomington, IN 47408 USA
[3] Tencent Al Lab, Bellevue, WA 98004 USA
[4] Tencent AI Lab, Shenzhen 518054, Peoples R China
关键词
Nonlinear distortion; Covariance matrices; Artificial neural networks; Array signal processing; Noise measurement; Feature extraction; Task analysis; Speech separation; deep learning; MVDR; ADL-MVDR; RECURRENT NEURAL-NETWORK; NOISE-REDUCTION; ENHANCEMENT; SINGLE; PERFORMANCE; MODEL; DEREVERBERATION; RECOGNITION; BEAMFORMER;
D O I
10.1109/TASLP.2021.3129335
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, however, conventional neural mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved in the MVDR solution is sometimes numerically unstable during joint training with neural networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech separation, which extends our preliminary multi-channel ADL-MVDR approach. The proposed MCMF ADL-MVDR system addresses linear and nonlinear distortions. Spatio-temporal cross correlations are also fully utilized in the proposed approach. The proposed systems are evaluated using a Mandarin audio-visual corpus and are compared with several state-of-the-art approaches. Experimental results demonstrate the superiority of our proposed systems under different scenarios and across several objective evaluation metrics, including ASR performance.
引用
收藏
页码:3526 / 3540
页数:15
相关论文
共 50 条
  • [41] Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
    Dai, Wang
    Li, Xiaofei
    Politis, Archontis
    Virtanen, Tuomas
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 241 - 245
  • [42] Multi-channel multi-speaker transformer for speech recognition
    Guo Yifan
    Tian Yao
    Suo Hongbin
    Wan Yulong
    INTERSPEECH 2023, 2023, : 4918 - 4922
  • [43] Self-Attention for Multi-Channel Speech Separation in Noisy and Reverberant Environments
    Liu, Conggui
    Sato, Yoshinao
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 794 - 799
  • [44] Multi-channel Speech Separation Using Deep Embedding With Multilayer Bootstrap Networks
    Yang, Ziye
    Zhang, Xiao-Lei
    Fu, Zhonghua
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 716 - 719
  • [45] Mentoring-Reverse Mentoring for Unsupervised Multi-channel Speech Source Separation
    Nakagome, Yu
    Togami, Masahito
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    INTERSPEECH 2020, 2020, : 86 - 90
  • [46] EFFICIENT INTEGRATION OF FIXED BEAMFORMERS AND SPEECH SEPARATION NETWORKS FOR MULTI-CHANNEL FAR-FIELD SPEECH SEPARATION
    Chen, Zhuo
    Yoshioka, Takuya
    Xiao, Xiong
    Li, Jinyu
    Seltzer, Michael L.
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5384 - 5388
  • [47] Tracker aided adaptive multi-frame recognition of a specific target
    Mahalanobis, Abhijit
    AUTOMATIC TARGET RECOGNITION XXVI, 2016, 9844
  • [48] Point target detection methods based on multi-frame association
    Tong Xiliang
    Yu Gongmin
    Zhou Feng
    Yin Ke
    SEVENTH SYMPOSIUM ON NOVEL PHOTOELECTRONIC DETECTION TECHNOLOGY AND APPLICATIONS, 2021, 11763
  • [49] A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
    Xie, Wupeng
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Liu, Guanghong
    SYMMETRY-BASEL, 2023, 15 (02):
  • [50] Multi-label learning based target detecting from multi-frame data
    Mei, Mengqing
    He, Fazhi
    IET IMAGE PROCESSING, 2021, 15 (14) : 3638 - 3644