Mask-based blind source separation and MVDR beamforming in ASR

被引:3
|
作者
He, Renke [1 ]
Long, Yanhua [1 ]
Li, Yijie [2 ]
Liang, Jiaen [2 ]
机构
[1] Shanghai Normal Univ, Dept Elect & Informat Engn, Shanghai 200234, Peoples R China
[2] Unisound AI Technol Co Ltd, Beijing 100089, Peoples R China
基金
中国国家自然科学基金;
关键词
Cocktail party problem; MVDR; BSS; T-F masking; Speech enhancement; SPEECH SEPARATION; MIXTURES;
D O I
10.1007/s10772-019-09666-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a front-end enhancement system for automatic speech recognition to address the cocktail party problem. Cocktail party problem is focus on recognizing the target speech when multiple speakers talk in the noisy real-environments. Many conventional techniques have been proposed. In this work, we propose a new framework to integrate the conventional blind source separation and minimum variance distortionless response beamformer for the speech enhancement and source separation of the recent CHiME-5 challenge. In our experiments, we found that the time-frequency (T-F) mask estimation strategy based on the BSS algorithm should be different for speech enhancement and source separation. The main difference is that whether we need to account for background noise as an additional class during T-F mask estimation. Experimental results showed that the proposed framework was very beneficial to improve the speech recognition performance on the Single-array-track of CHiME-5. We obtained relative 13.5% WER reduction than the official baseline system by only improving the front-end speech enhancement framework.
引用
收藏
页码:133 / 140
页数:8
相关论文
共 50 条
  • [31] Detection in present of reverberation Combined with Blind Source Separation and Beamforming
    Xu, Ce
    Zhang, Xinhua
    Xu, Zhaoyan
    2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 4, 2010, : 158 - 162
  • [32] Blind Adaptive Principal Eigenvector Beamforming for Acoustical Source Separation
    Warsitz, Ernst
    Haeb-Umbach, Reinhold
    Vu, Dang Hai Tran
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 461 - 464
  • [33] Towards Robust Multiple Blind Source Localization Using Source Separation and Beamforming
    Pu, Henglin
    Cai, Chao
    Hu, Menglan
    Deng, Tianping
    Zheng, Rong
    Luo, Jun
    SENSORS, 2021, 21 (02) : 1 - 10
  • [34] Noise Source Separation based on the Blind Source Separation
    Yang, Yang
    Li, Zuoli
    Wang, Xiuqin
    Zhang, Di
    2011 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, 2011, : 2236 - +
  • [35] EXPLORING PRACTICAL ASPECTS OF NEURAL MASK-BASED BEAMFORMING FOR FAR-FIELD SPEECH RECOGNITION
    Boeddeker, Christoph
    Erdogan, Hakan
    Yoshioka, Takuya
    Haeb-Umbach, Reinhold
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6697 - 6701
  • [36] ICA and binary-mask-based blind source separation with small directional microphones
    Mori, Y
    Saruwatari, H
    Takatani, T
    Shikano, K
    Hiekata, T
    Morita, T
    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION, PROCEEDINGS, 2006, 3889 : 649 - 657
  • [37] Partially RepRapable automated open source bag valve mask-based ventilator
    Petsiuk, Aliaksei
    Tanikella, Nagendra G.
    Dertinger, Samantha
    Pringle, Adam
    Oberloier, Shane
    Pearce, Joshua M.
    HARDWAREX, 2020, 8 (08):
  • [38] UNSUPERVISED NEURAL MASK ESTIMATOR FOR GENERALIZED EIGEN-VALUE BEAMFORMING BASED ASR
    Kumar, Rohit
    Sreeram, Anirudh
    Purushothaman, Anurenjan
    Ganapathy, Sriram
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7494 - 7498
  • [39] MEETING RECOGNITION WITH ASYNCHRONOUS DISTRIBUTED MICROPHONE ARRAY USING BLOCK-WISE REFINEMENT OF MASK-BASED MVDR BEAMFORMER
    Araki, Shoko
    Ono, Nobutaka
    Kinoshita, Keisuke
    Delcroix, Marc
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5694 - 5698
  • [40] ONLINE INTEGRATION OF DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING
    Matsui, Yutaro
    Nakatani, Tomohiro
    Delcroix, Marc
    Kinoshita, Keisuke
    Ito, Nobutaka
    Araki, Shoko
    Makino, Shoji
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 71 - 75