Combining speech enhancement and auditory feature extraction for robust speech recognition

被引:40
|
作者
Kleinschmidt, M [1 ]
Tchorz, J [1 ]
Kollmeier, B [1 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, AG Med Phys, D-26111 Oldenburg, Germany
关键词
robust speech recognition; perceptive modeling; auditory front end; speech enhancement;
D O I
10.1016/S0167-6393(00)00047-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A major deficiency in state-of-the-art automatic speech recognition (ASR) systems is the lack of robustness in additive and convolutional noise. The model of auditory perception (PEMO), developed by Dau et al. (T. Dau, D. Puschel, A. Kohlrausch, J. Acoust. Sec. Am. 99 (6) (1996) 3615-3622) for psychoacoustical purposes, partly overcomes these difficulties when used as a front end for automatic speech recognition. To further improve the performance of this auditory-based recognition system in background noise, different speech enhancement methods were examined, which have been evaluated in earlier studies as components of digital hearing aids. Monaural noise reduction, as proposed by Ephraim and Malah (Y. Ephraim, D. Malah, IEEE Trans. Acoust. Speech Signal Process. ASSP-32 (6) (1984) 1109-1121) was compared to a binaural filter and dereverberation algorithm after Wittkop et al. (T. Wittkop, S. Albani, V. Hohmann, J. Peissig, W. Woods, B. Kollmeier, Acustica United with Acta Acustica 83 (4) (1997) 684- 699). Both noise reduction algorithms yield improvements in recognition performance equivalent to up to 10 dB SNR in non-reverberant conditions for all types of noise, while the performance in clean speech is not significantly affected. Even in real-world reverberant conditions the speech enhancement schemes lead to improvements in recognition performance comparable to an SNR gain of up to 5 dB. This effect exceeds the expectations as earlier studies found no increase in speech intelligibility for hearing-impaired human subjects. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:75 / 91
页数:17
相关论文
共 50 条
  • [31] An auditory model for robust speech recognition
    Luo, Xuewen
    Soon, Ing Yann
    Yeo, Chai Kiat
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1105 - 1109
  • [32] Visual speech feature extraction for improved speech recognition
    Zhang, X
    Mersereau, RM
    Clements, M
    Broun, CC
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 1993 - 1996
  • [33] Robust distributed speech recognition using speech enhancement
    Flynn, Ronan
    Jones, Edward
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273
  • [34] Compensation of speech enhancement distortion for robust speech recognition
    Ding, P
    Cao, ZG
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
  • [35] SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION IN MOTORCYCLE ENVIRONMENT
    Mporas, Iosif
    Ganchev, Todor
    Kocsis, Otilia
    Fakotakis, Nikos
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2010, 19 (02) : 159 - 173
  • [36] Robust recognition of noisy speech using speech enhancement
    Xu, YF
    Zhang, JJ
    Yao, KS
    Cao, ZG
    Ma, ZX
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 734 - 737
  • [37] Predicted walk with correlation in particle filter speech feature enhancement for robust automatic speech recognition
    Woelfel, Matthias
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4705 - 4708
  • [38] Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition
    Ishizuka, K
    Miyazaki, N
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 141 - 144
  • [39] FEATURE ENHANCEMENT FOR ROBUST SPEECH RECOGNITION ON SMARTPHONES WITH DUAL-MICROPHONE
    Lopez-Espejo, Ivan
    Gomez, Angel M.
    Gonzalez, Jose A.
    Peinado, Antonio M.
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 21 - 25
  • [40] Feature enhancement by speaker-normalized splice for robust speech recognition
    Shinohara, Yusuke
    Masuko, Takashi
    Akamine, Masami
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4881 - 4884