Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

被引：0

作者：

Wang, Longbiao ^{[1
]}

Ren, Bo ^{[1
]}

Ueda, Yuma ^{[2
]}

Kai, Atsuhiko ^{[2
]}

Teraoka, Shunta ^{[2
]}

Fukushima, Taku ^{[2
]}

机构：

[1] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan

[2] Shizuoka Univ, Hamamatsu, Shizuoka 4328561, Japan

来源：

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年

关键词：

POSITION-DEPENDENT CMN; SPECTRAL SUBTRACTION;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.

引用

页数：5

共 50 条

[1] Environment-dependent denoising autoencoder for distant-talking speech recognition
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Ren, Bo
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
[2] Environment-dependent denoising autoencoder for distant-talking speech recognition
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Bo Ren
EURASIP Journal on Advances in Signal Processing, 2015
[3] Robust distant-talking speech recognition
Lin, Q
Che, C
Yuk, DS
Jin, L
deVries, B
Pearson, J
Flanagan, J
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 21 - 24
[4] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
[5] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Xiong Xiao
Eng Siong Chng
Haizhou Li
Journal of Signal Processing Systems, 2016, 82 : 151 - 161
[6] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
[7] Improved HMM separation for distant-talking speech recognition
Takiguchi, T
Nishimura, M
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1127 - 1137
[8] ROBUSTNESS TO SPEAKER POSITION IN DISTANT-TALKING AUTOMATIC SPEECH RECOGNITION
Gomez, Randy
Nakamura, Keisuke
Nakadai, Kazuhiro
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7034 - 7038
[9] Composite decision by Bayesian inference in distant-talking speech recognition
Ji, Mikyong
Kim, Sungtak
Kim, Hoirin
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 463 - 470
[10] Hidden Markov model training with contaminated speech material for distant-talking speech recognition
Matassoni, M
Omologo, M
Giuliani, D
Svaizer, P
COMPUTER SPEECH AND LANGUAGE, 2002, 16 (02): : 205 - 223

← 1 2 3 4 5 →