A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition

被引：12

作者：

Kim, Wooil ^{[1
]}

Hansen, John H. L. ^{[1
]}

机构：

[1] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, CRSS, Richardson, TX 75080 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期

关键词：

Background noise; mask estimation; missing-feature; posterior-based representative mean (PRM) estimate; robust speech recognition; COMPENSATION; NOISE; ENHANCEMENT; RECONSTRUCTION;

D O I：

10.1109/TASL.2010.2091633

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in various types of background noise conditions. A conventional mask estimation method based on spectral subtraction degrades performance, due to incorrect estimation of the noise signal which fails to accurately represent the variations of background noise during the incoming speech utterance. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) estimate for determining the reliability of the input speech spectral components, which is obtained as a weighted sum of the mean parameters of the speech model using the posterior probability. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method. Experimental results demonstrate that the proposed mask estimation method provides more separable distributions for the reliable/unreliable component classifier compared to the conventional mask estimation method. The recognition performance is evaluated using the Aurora 2.0 framework over various types of background noise conditions and the CU-Move real-life in-vehicle corpus. The performance evaluation shows that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in various types of background noise conditions, compared to the conventional mask estimation method which is based on spectral subtraction. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +23.41% and +9.45% average relative improvements in word error rate for all four types of noise conditions and CU-Move corpus, respectively, compared to conventional mask estimation methods.

引用

页码：1434 / 1443

页数：10

共 21 条

[1] Mask Estimation Employing Posterior-Based Representative Mean for Missing-Feature Speech Recognition with Time-Varying Background Noise
Kim, Wooil
Hansen, John H. L.
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 194 - 198
[2] Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots
Takahashi, Toru
Yamamoto, Shun'ichi
Nakadai, Kazuhiro
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 992 - +
[3] Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise
Kim, Wooil
Stern, Richard M.
SPEECH COMMUNICATION, 2011, 53 (01) : 1 - 11
[4] Mask Estimation Based on Band-Independent Bayesian Classifier for Missing-Feature Reconstruction
Kim, Wooil
Stern, Richard M.
Ko, Hanseok
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2006, 25 (02): : 78 - 87
[5] A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition
Seltzer, ML
Raj, B
Stern, RM
SPEECH COMMUNICATION, 2004, 43 (04) : 379 - 393
[6] MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
Gonzalez, Jose A.
Peinado, Antonio M.
Ma, Ning
Gomez, Angel M.
Barker, Jon
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 624 - 635
[7] Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears
Takeda, Ryu
Yamamoto, Shun'ichi
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 878 - +
[8] Mask Estimation in Non-stationary Noise Environments for Missing Feature Based Robust Speech Recognition
Badiezadegan, Shirin
Rose, Richard C.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2062 - 2065
[9] Mask estimation based on sound localisation for missing data speech recognition
Harding, S
Barker, J
Brown, GJ
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 537 - 540
[10] Mask estimation for missing data speech recognition based on statistics of binaural interaction
Harding, S
Barker, J
Brown, GJ
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 58 - 67

← 1 2 3 →