Mask estimation for missing data speech recognition based on statistics of binaural interaction

被引:38
|
作者
Harding, S [1 ]
Barker, J [1 ]
Brown, GJ [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
关键词
automatic speech recognition; binaural; computational auditory scene analysis (CASA); interaural level differences (ILD); interaural time differences (ITD); missing data; reverberation;
D O I
10.1109/TSA.2005.860354
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with the "missing data" approach for robust speech recognition in noise. Missing data time-frequency masks are created using probability distributions based on estimates of interaural time and level differences (ITD and ILD) for mixed utterances in reverberated conditions; these masks indicate which regions of the spectrum constitute reliable evidence of the target speech signal. A number of experiments compare the relative efficacy of the binaural cues when used individually and in combination. We also investigate the ability of the system to generalize to acoustic conditions not encountered during training. Performance on a continuous digit recognition task using this method is found to be good, even in a particularly challenging environment with three concurrent male talkers.
引用
收藏
页码:58 / 67
页数:10
相关论文
共 50 条
  • [41] Adaptive Mask Based Attention Mechanism for Mandarin Speech Recognition
    Li, Penghua
    Cheng, Jiawei
    Rong, Yujun
    Huang, Ziheng
    Xie, Xiao
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 2816 - 2820
  • [42] Binaural Classification-Based Speech Segregation and Robust Speaker Recognition System
    Venkatesan, R.
    Ganesh, A. Balaji
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (08) : 3383 - 3411
  • [43] Binaural Classification-Based Speech Segregation and Robust Speaker Recognition System
    R. Venkatesan
    A. Balaji Ganesh
    Circuits, Systems, and Signal Processing, 2018, 37 : 3383 - 3411
  • [44] Binaural cues for fragment-based speech recognition in reverberant multisource environments
    Ma, Ning
    Barker, Jon
    Christensen, Heidi
    Green, Phil
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1668 - 1671
  • [45] Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data
    Kian Ebrahim Kafoori
    Seyed Mohammad Ahadi
    Circuits, Systems, and Signal Processing, 2018, 37 : 1625 - 1648
  • [46] Efficient data selection for speech recognition based on prior confidence estimation using speech and monophone models
    Kobashikawa, Satoshi
    Asami, Taichi
    Yamaguchi, Yoshikazu
    Masataki, Hirokazu
    Takahashi, Satoshi
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (06): : 1287 - 1297
  • [47] Robust Recognition of Noisy Speech Through Partial Imputation of Missing Data
    Kafoori, Kian Ebrahim
    Ahadi, Seyed Mohammad
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (04) : 1625 - 1648
  • [48] Multi-candidate missing data imputation for robust speech recognition
    Yujun Wang
    Hugo Van hamme
    EURASIP Journal on Audio, Speech, and Music Processing, 2012
  • [49] Multi-candidate missing data imputation for robust speech recognition
    Wang, Yujun
    Van Hamme, Hugo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,
  • [50] On noise masking for automatic missing data speech recognition: A survey and discussion
    Cerisara, Christophe
    Demange, Sebastien
    Haton, Jean-Paul
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (03): : 443 - 457