Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

被引:0
|
作者
Wang, Longbiao [1 ]
Ren, Bo [1 ]
Ueda, Yuma [2 ]
Kai, Atsuhiko [2 ]
Teraoka, Shunta [2 ]
Fukushima, Taku [2 ]
机构
[1] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan
[2] Shizuoka Univ, Hamamatsu, Shizuoka 4328561, Japan
关键词
POSITION-DEPENDENT CMN; SPECTRAL SUBTRACTION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Environment-dependent denoising autoencoder for distant-talking speech recognition
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Ren, Bo
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [2] Environment-dependent denoising autoencoder for distant-talking speech recognition
    Yuma Ueda
    Longbiao Wang
    Atsuhiko Kai
    Bo Ren
    EURASIP Journal on Advances in Signal Processing, 2015
  • [3] Robust distant-talking speech recognition
    Lin, Q
    Che, C
    Yuk, DS
    Jin, L
    deVries, B
    Pearson, J
    Flanagan, J
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 21 - 24
  • [4] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
  • [5] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Yuma Ueda
    Longbiao Wang
    Atsuhiko Kai
    Xiong Xiao
    Eng Siong Chng
    Haizhou Li
    Journal of Signal Processing Systems, 2016, 82 : 151 - 161
  • [6] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
  • [7] Improved HMM separation for distant-talking speech recognition
    Takiguchi, T
    Nishimura, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1127 - 1137
  • [8] ROBUSTNESS TO SPEAKER POSITION IN DISTANT-TALKING AUTOMATIC SPEECH RECOGNITION
    Gomez, Randy
    Nakamura, Keisuke
    Nakadai, Kazuhiro
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7034 - 7038
  • [9] Composite decision by Bayesian inference in distant-talking speech recognition
    Ji, Mikyong
    Kim, Sungtak
    Kim, Hoirin
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 463 - 470
  • [10] Hidden Markov model training with contaminated speech material for distant-talking speech recognition
    Matassoni, M
    Omologo, M
    Giuliani, D
    Svaizer, P
    COMPUTER SPEECH AND LANGUAGE, 2002, 16 (02): : 205 - 223