Combining speech enhancement and auditory feature extraction for robust speech recognition

被引：40

作者：

Kleinschmidt, M ^{[1
]}

Tchorz, J ^{[1
]}

Kollmeier, B ^{[1
]}

机构：

[1] Carl von Ossietzky Univ Oldenburg, AG Med Phys, D-26111 Oldenburg, Germany

来源：

SPEECH COMMUNICATION | 2001年 / 34卷 / 1-2期

关键词：

robust speech recognition; perceptive modeling; auditory front end; speech enhancement;

D O I：

10.1016/S0167-6393(00)00047-9

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A major deficiency in state-of-the-art automatic speech recognition (ASR) systems is the lack of robustness in additive and convolutional noise. The model of auditory perception (PEMO), developed by Dau et al. (T. Dau, D. Puschel, A. Kohlrausch, J. Acoust. Sec. Am. 99 (6) (1996) 3615-3622) for psychoacoustical purposes, partly overcomes these difficulties when used as a front end for automatic speech recognition. To further improve the performance of this auditory-based recognition system in background noise, different speech enhancement methods were examined, which have been evaluated in earlier studies as components of digital hearing aids. Monaural noise reduction, as proposed by Ephraim and Malah (Y. Ephraim, D. Malah, IEEE Trans. Acoust. Speech Signal Process. ASSP-32 (6) (1984) 1109-1121) was compared to a binaural filter and dereverberation algorithm after Wittkop et al. (T. Wittkop, S. Albani, V. Hohmann, J. Peissig, W. Woods, B. Kollmeier, Acustica United with Acta Acustica 83 (4) (1997) 684- 699). Both noise reduction algorithms yield improvements in recognition performance equivalent to up to 10 dB SNR in non-reverberant conditions for all types of noise, while the performance in clean speech is not significantly affected. Even in real-world reverberant conditions the speech enhancement schemes lead to improvements in recognition performance comparable to an SNR gain of up to 5 dB. This effect exceeds the expectations as earlier studies found no increase in speech intelligibility for hearing-impaired human subjects. (C) 2001 Elsevier Science B.V. All rights reserved.

引用

页码：75 / 91

页数：17

共 50 条

[31] An auditory model for robust speech recognition
Luo, Xuewen
Soon, Ing Yann
Yeo, Chai Kiat
2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1105 - 1109
[32] Visual speech feature extraction for improved speech recognition
Zhang, X
Mersereau, RM
Clements, M
Broun, CC
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 1993 - 1996
[33] Robust distributed speech recognition using speech enhancement
Flynn, Ronan
Jones, Edward
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273
[34] Compensation of speech enhancement distortion for robust speech recognition
Ding, P
Cao, ZG
2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
[35] SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION IN MOTORCYCLE ENVIRONMENT
Mporas, Iosif
Ganchev, Todor
Kocsis, Otilia
Fakotakis, Nikos
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2010, 19 (02) : 159 - 173
[36] Robust recognition of noisy speech using speech enhancement
Xu, YF
Zhang, JJ
Yao, KS
Cao, ZG
Ma, ZX
2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 734 - 737
[37] Predicted walk with correlation in particle filter speech feature enhancement for robust automatic speech recognition
Woelfel, Matthias
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4705 - 4708
[38] Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition
Ishizuka, K
Miyazaki, N
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 141 - 144
[39] FEATURE ENHANCEMENT FOR ROBUST SPEECH RECOGNITION ON SMARTPHONES WITH DUAL-MICROPHONE
Lopez-Espejo, Ivan
Gomez, Angel M.
Gonzalez, Jose A.
Peinado, Antonio M.
2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 21 - 25
[40] Feature enhancement by speaker-normalized splice for robust speech recognition
Shinohara, Yusuke
Masuko, Takashi
Akamine, Masami
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4881 - 4884

← 1 2 3 4 5 →