A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition

被引:12
|
作者
Kim, Wooil [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, CRSS, Richardson, TX 75080 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期
关键词
Background noise; mask estimation; missing-feature; posterior-based representative mean (PRM) estimate; robust speech recognition; COMPENSATION; NOISE; ENHANCEMENT; RECONSTRUCTION;
D O I
10.1109/TASL.2010.2091633
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in various types of background noise conditions. A conventional mask estimation method based on spectral subtraction degrades performance, due to incorrect estimation of the noise signal which fails to accurately represent the variations of background noise during the incoming speech utterance. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) estimate for determining the reliability of the input speech spectral components, which is obtained as a weighted sum of the mean parameters of the speech model using the posterior probability. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method. Experimental results demonstrate that the proposed mask estimation method provides more separable distributions for the reliable/unreliable component classifier compared to the conventional mask estimation method. The recognition performance is evaluated using the Aurora 2.0 framework over various types of background noise conditions and the CU-Move real-life in-vehicle corpus. The performance evaluation shows that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in various types of background noise conditions, compared to the conventional mask estimation method which is based on spectral subtraction. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +23.41% and +9.45% average relative improvements in word error rate for all four types of noise conditions and CU-Move corpus, respectively, compared to conventional mask estimation methods.
引用
收藏
页码:1434 / 1443
页数:10
相关论文
共 21 条
  • [1] Mask Estimation Employing Posterior-Based Representative Mean for Missing-Feature Speech Recognition with Time-Varying Background Noise
    Kim, Wooil
    Hansen, John H. L.
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 194 - 198
  • [2] Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots
    Takahashi, Toru
    Yamamoto, Shun'ichi
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 992 - +
  • [3] Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise
    Kim, Wooil
    Stern, Richard M.
    SPEECH COMMUNICATION, 2011, 53 (01) : 1 - 11
  • [4] Mask Estimation Based on Band-Independent Bayesian Classifier for Missing-Feature Reconstruction
    Kim, Wooil
    Stern, Richard M.
    Ko, Hanseok
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2006, 25 (02): : 78 - 87
  • [5] A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition
    Seltzer, ML
    Raj, B
    Stern, RM
    SPEECH COMMUNICATION, 2004, 43 (04) : 379 - 393
  • [6] MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
    Gonzalez, Jose A.
    Peinado, Antonio M.
    Ma, Ning
    Gomez, Angel M.
    Barker, Jon
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 624 - 635
  • [7] Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears
    Takeda, Ryu
    Yamamoto, Shun'ichi
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 878 - +
  • [8] Mask Estimation in Non-stationary Noise Environments for Missing Feature Based Robust Speech Recognition
    Badiezadegan, Shirin
    Rose, Richard C.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2062 - 2065
  • [9] Mask estimation based on sound localisation for missing data speech recognition
    Harding, S
    Barker, J
    Brown, GJ
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 537 - 540
  • [10] Mask estimation for missing data speech recognition based on statistics of binaural interaction
    Harding, S
    Barker, J
    Brown, GJ
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 58 - 67