Mask estimation for missing data speech recognition based on statistics of binaural interaction

被引:38
|
作者
Harding, S [1 ]
Barker, J [1 ]
Brown, GJ [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
关键词
automatic speech recognition; binaural; computational auditory scene analysis (CASA); interaural level differences (ILD); interaural time differences (ITD); missing data; reverberation;
D O I
10.1109/TSA.2005.860354
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with the "missing data" approach for robust speech recognition in noise. Missing data time-frequency masks are created using probability distributions based on estimates of interaural time and level differences (ITD and ILD) for mixed utterances in reverberated conditions; these masks indicate which regions of the spectrum constitute reliable evidence of the target speech signal. A number of experiments compare the relative efficacy of the binaural cues when used individually and in combination. We also investigate the ability of the system to generalize to acoustic conditions not encountered during training. Performance on a continuous digit recognition task using this method is found to be good, even in a particularly challenging environment with three concurrent male talkers.
引用
收藏
页码:58 / 67
页数:10
相关论文
共 50 条
  • [21] A deep neural network approach for missing-data mask estimation on dual-microphone smartphones: Application to noise-robust speech recognition
    López-Espejo, I.
    González, José A.
    Gómez, Ángel M.
    Peinado, Antonio M.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8854 : 119 - 128
  • [22] A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition
    Lopez-Espejo, Ivan
    Gonzalez, Jose A.
    Gomez, Angel M.
    Peinado, Antonio M.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 119 - 128
  • [23] Mask Estimation Employing Posterior-Based Representative Mean for Missing-Feature Speech Recognition with Time-Varying Background Noise
    Kim, Wooil
    Hansen, John H. L.
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 194 - 198
  • [24] Model based Estimation of STP parameters for Binaural Speech Enhancement
    Kavalekalam, Mathew Shaji
    Nielsen, Jesper Kjaer
    Christensen, Mads Graesboll
    Boldt, Jesper
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2479 - 2483
  • [25] A study of speech recognition based on fuzzy statistics
    Li, SL
    Hou, CH
    ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 757 - 760
  • [26] An iterative mask estimation approach to deep learning based multi-channel speech recognition
    Tu, Yan-Hui
    Du, Jun
    Sun, Lei
    Ma, Feng
    Wang, Hai-Kun
    Chen, Jing-Dong
    Lee, Chin-Hui
    SPEECH COMMUNICATION, 2019, 106 : 31 - 43
  • [27] Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots
    Takahashi, Toru
    Yamamoto, Shun'ichi
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 992 - +
  • [28] Efficient data selection for speech recognition based on prior confidence estimation
    Kobashikawa, Satoshi
    Asami, Taichi
    Yamaguchi, Yoshikazu
    Masataki, Hirokazu
    Takahashi, Satoshi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2011, 32 (04) : 151 - 153
  • [29] Feature classification criterion for missing features mask estimation in robust speaker recognition
    Ribas Gonzalez, Dayana
    Calvo de Lara, Jose Ramon
    SIGNAL IMAGE AND VIDEO PROCESSING, 2014, 8 (02) : 365 - 375
  • [30] Feature classification criterion for missing features mask estimation in robust speaker recognition
    Dayana Ribas González
    José Ramón Calvo de Lara
    Signal, Image and Video Processing, 2014, 8 : 365 - 375