Mask estimation for missing data speech recognition based on statistics of binaural interaction

被引:38
|
作者
Harding, S [1 ]
Barker, J [1 ]
Brown, GJ [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
关键词
automatic speech recognition; binaural; computational auditory scene analysis (CASA); interaural level differences (ILD); interaural time differences (ITD); missing data; reverberation;
D O I
10.1109/TSA.2005.860354
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with the "missing data" approach for robust speech recognition in noise. Missing data time-frequency masks are created using probability distributions based on estimates of interaural time and level differences (ITD and ILD) for mixed utterances in reverberated conditions; these masks indicate which regions of the spectrum constitute reliable evidence of the target speech signal. A number of experiments compare the relative efficacy of the binaural cues when used individually and in combination. We also investigate the ability of the system to generalize to acoustic conditions not encountered during training. Performance on a continuous digit recognition task using this method is found to be good, even in a particularly challenging environment with three concurrent male talkers.
引用
收藏
页码:58 / 67
页数:10
相关论文
共 50 条
  • [31] Robust speech recognition using signal processing based on binaural perception
    Stern, RM
    Sullivan, TM
    ACUSTICA, 1996, 82 : S92 - S92
  • [32] A neural oscillator sound separator for missing data speech recognition
    Brown, GJ
    Barker, J
    Wang, DL
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 2907 - 2912
  • [33] Robust automatic speech recognition with missing and unreliable acoustic data
    Cooke, M
    Green, P
    Josifovski, L
    Vizinho, A
    SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
  • [34] Bounded cepstral marginalization of missing data for robust speech recognition
    Kafoori, Kian Ebrahim
    Ahadi, Seyed Mohammad
    COMPUTER SPEECH AND LANGUAGE, 2016, 36 : 1 - 23
  • [35] Handling Convolutional Noise in Missing Data Automatic Speech Recognition
    Van Segbroeck, Maarten
    Van Hamme, Hugo
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2562 - 2565
  • [36] Speech recognition with missing data using recurrent neural nets
    Parveen, S
    Green, PD
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1189 - 1195
  • [37] Robust speech recognition using missing feature theory and target speech enhancement based on degenerate unmixing and estimation technique
    Kim, Minook
    Kim, Ji-Seon
    Park, Hyung-Min
    INDEPENDENT COMPONENT ANALYSES, WAVELETS, NEURAL NETWORKS, BIOSYSTEMS, AND NANOENGINEERING IX, 2011, 8058
  • [38] Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise
    Kim, Wooil
    Stern, Richard M.
    SPEECH COMMUNICATION, 2011, 53 (01) : 1 - 11
  • [39] Robust Automatic Speech Recognition with Decoder Oriented Ideal Binary Mask Estimation
    Kim, Lae-Hoon
    Kim, Kyung-Tae
    Hasegawa-Johnson, Mark
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2066 - 2069
  • [40] Improving Speech Recognition of Two Simultaneous Speech Signals by Integrating ICA BSS and Automatic Missing Feature Mask Generation
    Takeda, Ryu
    Yamamoto, Shun'ichi
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2302 - 2305