Time-Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions

被引:12
|
作者
Kim, Wooil [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, CRSS, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 07期
关键词
Band-limited speech; correlation; missing-feature; speech recognition; time-frequency (TF); COMPENSATION;
D O I
10.1109/TASL.2009.2015080
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Band-limited speech represents one of the most challenging factors for robust speech recognition. This is especially true in supporting audio corpora from sources that have a range of conditions in spoken document retrieval requiring effective automatic speech recognition. The missing-feature reconstruction method has a problem when applied to band-limited speech reconstruction, since it assumes the observations in the unreliable regions are always greater than the latent original clean speech. The approach developed here depends only on reliable components to calculate the posterior probability to mitigate the problem. This study proposes an advanced method to effectively utilize the correlation information of the spectral components across time and frequency axes in an effort to increase the performance of missing-feature reconstruction in band-limited conditions. We employ an F1 Area Window and Cutoff Border Window in order to include more knowledge on reliable components which are highly correlated with the cutoff frequency band. To detect the cutoff regions for missing-feature reconstruction, blind mask estimation is also presented, which employs the synthesized band-limited speech model without secondary training data. Experiments to evaluate the performance of the proposed methods are accomplished using the SPHINX3 speech recognition engine and the TIMIT corpus. Experimental results demonstrate that the proposed time-frequency (TF) correlation based missing-feature reconstruction method is significantly more effective in improving band-limited speech recognition accuracy. By employing the proposed TF-missing feature reconstruction method, we obtain up to 14.61% of average relative improvement in word error rate (WER) for four available bandwidths with cutoff frequencies 1.0, 1.5, 2.0, and 2.5 kHz, respectively, compared to earlier formulated methods. Experimental results on the National Gallery of the Spoken Word (NGSW) corpus also show the proposed method is effective in improving band-limited speech recognition in real-life spoken document conditions.
引用
收藏
页码:1292 / 1304
页数:13
相关论文
共 50 条
  • [21] Isolate Speech Recognition Based on Time-Frequency Analysis Methods
    Mantilla-Caeiros, Alfredo
    Nakano Miyatake, Mariko
    Perez-Meana, Hector
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS, 2009, 5856 : 297 - +
  • [22] A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition
    Kim, Wooil
    Hansen, John H. L.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1434 - 1443
  • [23] Noise estimation based on time-frequency correlation for speech enhancement
    Yuan, Wenhao
    Lin, Jiajun
    An, Wei
    Wang, Yu
    Chen, Ning
    APPLIED ACOUSTICS, 2013, 74 (05) : 770 - 781
  • [24] Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation
    Jiang, Wenbin
    Wen, Fei
    Liu, Peilin
    IEEE ACCESS, 2018, 6 : 52385 - 52392
  • [25] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Dorothea Kolossa
    Ramon Fernandez Astudillo
    Eugen Hoffmann
    Reinhold Orglmeister
    EURASIP Journal on Audio, Speech, and Music Processing, 2010
  • [26] Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions
    Kolossa, Dorothea
    Astudillo, Ramon Fernandez
    Hoffmann, Eugen
    Orglmeister, Reinhold
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2010,
  • [27] A time-frequency correlation-based blind source separation method for time-delayed mixtures
    Puigt, Matthieu
    Deville, Yannick
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5711 - 5714
  • [28] Radar emitter recognition based on the deep learning of time-frequency feature
    Li D.
    Yang R.
    Dong R.
    Yang, Ruijuan (ruijuany@sohu.com), 1600, National University of Defense Technology (42): : 112 - 119
  • [29] A Flow Correlation Scheme Based on Perceptual Hash and Time-Frequency Feature
    Wang, Zhe
    Chen, Yonghong
    Wang, Linfan
    Xie, Jinpu
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2023 - 2027
  • [30] Lost Speech Reconstruction Method using Speech Recognition based on Missing Feature Theory and HMM-based Speech Synthesis
    Kuroiwa, Shingo
    Tsuge, Satoru
    Ren, Fuji
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1105 - 1108