Combining speech enhancement and auditory feature extraction for robust speech recognition

被引:40
|
作者
Kleinschmidt, M [1 ]
Tchorz, J [1 ]
Kollmeier, B [1 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, AG Med Phys, D-26111 Oldenburg, Germany
关键词
robust speech recognition; perceptive modeling; auditory front end; speech enhancement;
D O I
10.1016/S0167-6393(00)00047-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A major deficiency in state-of-the-art automatic speech recognition (ASR) systems is the lack of robustness in additive and convolutional noise. The model of auditory perception (PEMO), developed by Dau et al. (T. Dau, D. Puschel, A. Kohlrausch, J. Acoust. Sec. Am. 99 (6) (1996) 3615-3622) for psychoacoustical purposes, partly overcomes these difficulties when used as a front end for automatic speech recognition. To further improve the performance of this auditory-based recognition system in background noise, different speech enhancement methods were examined, which have been evaluated in earlier studies as components of digital hearing aids. Monaural noise reduction, as proposed by Ephraim and Malah (Y. Ephraim, D. Malah, IEEE Trans. Acoust. Speech Signal Process. ASSP-32 (6) (1984) 1109-1121) was compared to a binaural filter and dereverberation algorithm after Wittkop et al. (T. Wittkop, S. Albani, V. Hohmann, J. Peissig, W. Woods, B. Kollmeier, Acustica United with Acta Acustica 83 (4) (1997) 684- 699). Both noise reduction algorithms yield improvements in recognition performance equivalent to up to 10 dB SNR in non-reverberant conditions for all types of noise, while the performance in clean speech is not significantly affected. Even in real-world reverberant conditions the speech enhancement schemes lead to improvements in recognition performance comparable to an SNR gain of up to 5 dB. This effect exceeds the expectations as earlier studies found no increase in speech intelligibility for hearing-impaired human subjects. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:75 / 91
页数:17
相关论文
共 50 条
  • [1] Combining speech enhancement with feature post-processing for robust speech recognition
    Lei, Jianjun
    Guo, Jun
    Liu, Gang
    Wang, Jian
    Nie, Xiangfei
    Yang, Zhen
    INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 773 - 778
  • [2] Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1358 - 1361
  • [3] An auditory neural feature extraction method for robust speech recognition
    Guo, Wei
    Zhang, Liqing
    Xia, Bin
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 793 - +
  • [4] Feature extraction based on auditory representations for robust speech recognition
    Kim, DS
    Lee, SY
    Kil, RM
    Zhu, XL
    ELECTRONICS LETTERS, 1997, 33 (01) : 15 - 16
  • [5] Feature extraction for robust speech recognition
    Dharanipragada, S
    2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, PROCEEDINGS, 2002, : 855 - 858
  • [6] Combined speech enhancement and auditory modelling for robust distributed speech recognition
    Flynn, Ronan
    Jones, Edward
    SPEECH COMMUNICATION, 2008, 50 (10) : 797 - 809
  • [7] Geometrical feature extraction for robust speech recognition
    Li, Xiaokun
    Kwan, Chiman
    2005 39TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1 AND 2, 2005, : 558 - 562
  • [8] AN AUDITORY-BASED FEATURE FOR ROBUST SPEECH RECOGNITION
    Shao, Yang
    Jin, Zhaozhang
    Wang, DeLiang
    Srinivasan, Soundararajan
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4625 - +
  • [9] Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition
    Shi, Yanyan
    Bai, Jing
    Xue, Peiyun
    Shi, Dianxi
    IEEE ACCESS, 2019, 7 : 81911 - 81922
  • [10] Auditory-modeling inspired methods of feature extraction for robust automatic speech recognition
    Jing, ZN
    Hasegawa-Johnson, M
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 4176 - 4176