Time-Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions

被引:12
|
作者
Kim, Wooil [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, CRSS, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 07期
关键词
Band-limited speech; correlation; missing-feature; speech recognition; time-frequency (TF); COMPENSATION;
D O I
10.1109/TASL.2009.2015080
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Band-limited speech represents one of the most challenging factors for robust speech recognition. This is especially true in supporting audio corpora from sources that have a range of conditions in spoken document retrieval requiring effective automatic speech recognition. The missing-feature reconstruction method has a problem when applied to band-limited speech reconstruction, since it assumes the observations in the unreliable regions are always greater than the latent original clean speech. The approach developed here depends only on reliable components to calculate the posterior probability to mitigate the problem. This study proposes an advanced method to effectively utilize the correlation information of the spectral components across time and frequency axes in an effort to increase the performance of missing-feature reconstruction in band-limited conditions. We employ an F1 Area Window and Cutoff Border Window in order to include more knowledge on reliable components which are highly correlated with the cutoff frequency band. To detect the cutoff regions for missing-feature reconstruction, blind mask estimation is also presented, which employs the synthesized band-limited speech model without secondary training data. Experiments to evaluate the performance of the proposed methods are accomplished using the SPHINX3 speech recognition engine and the TIMIT corpus. Experimental results demonstrate that the proposed time-frequency (TF) correlation based missing-feature reconstruction method is significantly more effective in improving band-limited speech recognition accuracy. By employing the proposed TF-missing feature reconstruction method, we obtain up to 14.61% of average relative improvement in word error rate (WER) for four available bandwidths with cutoff frequencies 1.0, 1.5, 2.0, and 2.5 kHz, respectively, compared to earlier formulated methods. Experimental results on the National Gallery of the Spoken Word (NGSW) corpus also show the proposed method is effective in improving band-limited speech recognition in real-life spoken document conditions.
引用
收藏
页码:1292 / 1304
页数:13
相关论文
共 50 条
  • [41] Radar Target Recognition Based on Dictionary of Time-Frequency Feature and Nonnegative Sparse Decomposition
    Kong, Yihui
    Wang, Caiyun
    2015 12TH INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGY (IBCAST), 2015, : 672 - 674
  • [42] Radar signal recognition based on time-frequency feature extraction and residual neural network
    Xie C.
    Zhang L.
    Zhong Z.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2021, 43 (04): : 917 - 926
  • [43] Radar signal recognition based on the local binary pattern feature of time-frequency image
    Bai, H. (bhecho@126.com), 1600, China Spaceflight Society (34):
  • [44] Research on recognition algorithm of impact vibration based on HOG feature of time-frequency spectrum
    Ma, Huayuan
    Li, Xinghua
    Xie, Xingbo
    Liu, Ying
    Zhong, Mingshou
    Zhang, Fangyu
    WAVES IN RANDOM AND COMPLEX MEDIA, 2022,
  • [45] ITERATIVE GROUP SELECTION-BASED ENHANCEMENT OF TIME-FREQUENCY MASKS FOR MISSING DATA RECOGNITION
    Pullella, Daniel
    Togneri, Roberto
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (04)
  • [46] Missing-Feature-Theory-based Robust Simultaneous Speech Recognition System with Non-clean Speech Acoustic Model
    Takahashi, Toni
    Nakadai, Kazuhiro
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 2730 - 2735
  • [47] Speech Signal Analysis Of Autistic Children Based On Time-Frequency Domain Distinguishing Feature Extraction
    Chen, Le
    Zhang, Chao
    Gao, Xiangping
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1076 - 1081
  • [48] RESEARCH ON ALGORITHM OF ROBUST SPEECH PERCEPTUAL HASHING FOR TIME-FREQUENCY DOMAIN BASED ON HILBERT TRANSFORM
    Zhang, Qiuyu
    Yang, Zhongping
    Zhang, Qianyun
    Huang, Yibo
    Xing, Pengfei
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2015, 11 (04): : 1191 - 1204
  • [49] Improved robust features for speech recognition by integrating time-frequency principal components (TFPC) and histogram equalization (HEQ)
    Tsai, SM
    Lee, LS
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 297 - 302
  • [50] Noise-robust automatic speech recognition using Mainlobe-Resilient time-frequency quantile-based noise estimation
    Lee, SW
    Ching, PC
    Lee, T
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3, PROCEEDINGS, 2004, : 425 - 428