Environment-dependent denoising autoencoder for distant-talking speech recognition

被引：0

作者：

Yuma Ueda

Longbiao Wang

Atsuhiko Kai

Bo Ren

机构：

[1] Shizuoka University,Graduate School of Engineering

[2] Nagaoka University of Technology,undefined

来源：

EURASIP Journal on Advances in Signal Processing | / 2015卷

关键词：

Speech recognition; Dereverberation; Denoising autoencoder; Environment identification; Distant-talking speech;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two-step environment-dependent DAE.

引用

共 50 条

[1] Environment-dependent denoising autoencoder for distant-talking speech recognition
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Ren, Bo
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
[2] Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording
Wang, Longbiao
Ren, Bo
Ueda, Yuma
Kai, Atsuhiko
Teraoka, Shunta
Fukushima, Taku
2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[3] Robust distant-talking speech recognition
Lin, Q
Che, C
Yuk, DS
Jin, L
deVries, B
Pearson, J
Flanagan, J
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 21 - 24
[4] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
[5] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Xiong Xiao
Eng Siong Chng
Haizhou Li
Journal of Signal Processing Systems, 2016, 82 : 151 - 161
[6] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
[7] Improved HMM separation for distant-talking speech recognition
Takiguchi, T
Nishimura, M
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1127 - 1137
[8] ROBUSTNESS TO SPEAKER POSITION IN DISTANT-TALKING AUTOMATIC SPEECH RECOGNITION
Gomez, Randy
Nakamura, Keisuke
Nakadai, Kazuhiro
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7034 - 7038
[9] Composite decision by Bayesian inference in distant-talking speech recognition
Ji, Mikyong
Kim, Sungtak
Kim, Hoirin
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 463 - 470
[10] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition
Bo Ren
Longbiao Wang
Liang Lu
Yuma Ueda
Atsuhiko Kai
Multimedia Tools and Applications, 2016, 75 : 5093 - 5108

← 1 2 3 4 5 →