Time-Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions

被引:12
|
作者
Kim, Wooil [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, CRSS, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
关键词
Band-limited speech; correlation; missing-feature; speech recognition; time-frequency (TF); COMPENSATION;
D O I
10.1109/TASL.2009.2015080
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Band-limited speech represents one of the most challenging factors for robust speech recognition. This is especially true in supporting audio corpora from sources that have a range of conditions in spoken document retrieval requiring effective automatic speech recognition. The missing-feature reconstruction method has a problem when applied to band-limited speech reconstruction, since it assumes the observations in the unreliable regions are always greater than the latent original clean speech. The approach developed here depends only on reliable components to calculate the posterior probability to mitigate the problem. This study proposes an advanced method to effectively utilize the correlation information of the spectral components across time and frequency axes in an effort to increase the performance of missing-feature reconstruction in band-limited conditions. We employ an F1 Area Window and Cutoff Border Window in order to include more knowledge on reliable components which are highly correlated with the cutoff frequency band. To detect the cutoff regions for missing-feature reconstruction, blind mask estimation is also presented, which employs the synthesized band-limited speech model without secondary training data. Experiments to evaluate the performance of the proposed methods are accomplished using the SPHINX3 speech recognition engine and the TIMIT corpus. Experimental results demonstrate that the proposed time-frequency (TF) correlation based missing-feature reconstruction method is significantly more effective in improving band-limited speech recognition accuracy. By employing the proposed TF-missing feature reconstruction method, we obtain up to 14.61% of average relative improvement in word error rate (WER) for four available bandwidths with cutoff frequencies 1.0, 1.5, 2.0, and 2.5 kHz, respectively, compared to earlier formulated methods. Experimental results on the National Gallery of the Spoken Word (NGSW) corpus also show the proposed method is effective in improving band-limited speech recognition in real-life spoken document conditions.
引用
收藏
页码:1292 / 1304
页数:13
相关论文
共 50 条
  • [1] Missing-Feature Method for Speaker Recognition in Band-Restricted Conditions
    Kim, Wooil
    Hansen, John H. L.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1909 - 1912
  • [2] Missing-Feature Reconstruction by Leveraging Temporal Spectral Correlation for Robust Speech Recognition in Background Noise Conditions
    Kim, Wooil
    Hansen, John H. L.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 2111 - 2120
  • [3] MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
    Gonzalez, Jose A.
    Peinado, Antonio M.
    Ma, Ning
    Gomez, Angel M.
    Barker, Jon
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 624 - 635
  • [4] Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise
    Kim, Wooil
    Stern, Richard M.
    SPEECH COMMUNICATION, 2011, 53 (01) : 1 - 11
  • [5] Missing-Feature Reconstruction for Band-Limited Speech Recognition in Spoken Document Retrieval
    Kim, Wooil
    Hansen, John H. L.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2306 - 2309
  • [6] Parameter Tuning-Free Missing-Feature Reconstruction for Robust Sound Recognition
    Liu, Qi
    Wu, Jibin
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (01) : 78 - 89
  • [7] Mask Estimation Based on Band-Independent Bayesian Classifier for Missing-Feature Reconstruction
    Kim, Wooil
    Stern, Richard M.
    Ko, Hanseok
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2006, 25 (02): : 78 - 87
  • [8] TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Mitra, Vikramjit
    Franco, Horacio
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 317 - 323
  • [9] Time-Frequency Masking For Large Scale Robust Speech Recognition
    Wang, Yuxuan
    Misra, Ananya
    Chine, Kean K.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2469 - 2473
  • [10] Binary and ratio time-frequency masks for robust speech recognition
    Srinivasan, Soundararajan
    Roman, Nicoleta
    Wang, DeLiang
    SPEECH COMMUNICATION, 2006, 48 (11) : 1486 - 1501