Combining speech enhancement and auditory feature extraction for robust speech recognition

被引:40
|
作者
Kleinschmidt, M [1 ]
Tchorz, J [1 ]
Kollmeier, B [1 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, AG Med Phys, D-26111 Oldenburg, Germany
关键词
robust speech recognition; perceptive modeling; auditory front end; speech enhancement;
D O I
10.1016/S0167-6393(00)00047-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A major deficiency in state-of-the-art automatic speech recognition (ASR) systems is the lack of robustness in additive and convolutional noise. The model of auditory perception (PEMO), developed by Dau et al. (T. Dau, D. Puschel, A. Kohlrausch, J. Acoust. Sec. Am. 99 (6) (1996) 3615-3622) for psychoacoustical purposes, partly overcomes these difficulties when used as a front end for automatic speech recognition. To further improve the performance of this auditory-based recognition system in background noise, different speech enhancement methods were examined, which have been evaluated in earlier studies as components of digital hearing aids. Monaural noise reduction, as proposed by Ephraim and Malah (Y. Ephraim, D. Malah, IEEE Trans. Acoust. Speech Signal Process. ASSP-32 (6) (1984) 1109-1121) was compared to a binaural filter and dereverberation algorithm after Wittkop et al. (T. Wittkop, S. Albani, V. Hohmann, J. Peissig, W. Woods, B. Kollmeier, Acustica United with Acta Acustica 83 (4) (1997) 684- 699). Both noise reduction algorithms yield improvements in recognition performance equivalent to up to 10 dB SNR in non-reverberant conditions for all types of noise, while the performance in clean speech is not significantly affected. Even in real-world reverberant conditions the speech enhancement schemes lead to improvements in recognition performance comparable to an SNR gain of up to 5 dB. This effect exceeds the expectations as earlier studies found no increase in speech intelligibility for hearing-impaired human subjects. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:75 / 91
页数:17
相关论文
共 50 条
  • [21] Robust Feature Extraction Methods for Speech Recognition in Noisy Environments
    Mukheolkar, Ajinkya Sunil
    Alex, John Sahaya Rani
    2014 FIRST INTERNATIONAL CONFERENCE ON NETWORKS & SOFT COMPUTING (ICNSC), 2014, : 295 - 299
  • [22] Temporal modulation normalization for robust speech feature extraction and recognition
    Xugang Lu
    Shigeki Matsuda
    Masashi Unoki
    Satoshi Nakamura
    Multimedia Tools and Applications, 2011, 52 : 187 - 199
  • [23] A bio-inspired feature extraction for robust speech recognition
    Zouhir, Youssef
    Ouni, Kais
    SPRINGERPLUS, 2014, 3
  • [24] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2011, 52 (01) : 187 - 199
  • [25] A Correlational Discriminant Approach to Feature Extraction for Robust Speech Recognition
    Tomar, Vikrant Singh
    Rose, Richard C.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 554 - 557
  • [26] Physiologically Motivated Feature Extraction for Robust Automatic Speech Recognition
    Missaoui, Ibrahim
    Lachiri, Zied
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (04) : 297 - 301
  • [27] A robust feature extraction for automatic speech recognition in noisy environments
    Lima, C
    Almeida, LB
    Monteiro, JL
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 540 - 543
  • [28] Study of Robust Feature Extraction Techniques for Speech Recognition System
    Sharma, Usha
    Maheshkar, Sushila
    Mishra, A. N.
    2015 1ST INTERNATIONAL CONFERENCE ON FUTURISTIC TRENDS ON COMPUTATIONAL ANALYSIS AND KNOWLEDGE MANAGEMENT (ABLAZE), 2015, : 666 - 670
  • [29] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4354 - 4357
  • [30] Auditory-model based robust feature selection for speech recognition
    Koniaris, Christos
    Kuropatwinski, Marcin
    Kleijn, W. Bastiaan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (02): : EL73 - EL79