Environment-dependent denoising autoencoder for distant-talking speech recognition

被引:13
|
作者
Ueda, Yuma [1 ]
Wang, Longbiao [2 ]
Kai, Atsuhiko [1 ]
Ren, Bo [2 ]
机构
[1] Shizuoka Univ, Grad Sch Engn, Naka Ku, Hamamatsu, Shizuoka 4328561, Japan
[2] Nagaoka Univ Technol, 1603-1 Kamitomioka, Nagaoka, Niigata 9402188, Japan
关键词
Speech recognition; Dereverberation; Denoising autoencoder; Environment identification; Distant-talking speech; SPECTRAL SUBTRACTION; DEREVERBERATION; MODEL; REVERBERATION; ENHANCEMENT; DOMAIN; NOISE;
D O I
10.1186/s13634-015-0278-y
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two-step environment-dependent DAE.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Distant-talking robust speech recognition using late reflection components of room impulse response
    Gomez, Randy
    Even, Jani
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4581 - 4584
  • [32] Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments - Newest Part of the CENSREC Series -
    Nishiura, Takanobu
    Nakayama, Masato
    Denda, Yuki
    Kitaoka, Norihide
    Yamamoto, Kazumasa
    Yamada, Takeshi
    Tsuge, Satoru
    Miyajima, Chiyomi
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Tamura, Satoshi
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1828 - 1834
  • [33] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
    Zhaofeng Zhang
    Longbiao Wang
    Atsuhiko Kai
    Takanori Yamada
    Weifeng Li
    Masahiro Iwahashi
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [34] CENSREC-4: Development of Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments
    Nakayama, Masato
    Nishiura, Takanobu
    Denda, Yuki
    Kitaoka, Norihide
    Yamamoto, Kazumasa
    Yamada, Takeshi
    Tsuge, Satoru
    Miyajima, Chiyomi
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Tamura, Satoshi
    Ogawa, Tetsuji
    Matsuda, Shigeki
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 968 - +
  • [35] Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array
    Yamada, T
    Nakamura, S
    Shikano, K
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (02): : 48 - 56
  • [36] Dereverberantion based on Generalized Spectral Subtraction for Distant-talking Speaker Recognition
    Zhang, Zhaofeng
    Wang, Longbiao
    Kai, Atsuhiko
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [37] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
    Nakamura, S
    Heracleous, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
  • [38] Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction
    Shiota, Satoshi
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    Li, Weifeng
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 384 - +
  • [39] Minimum Kullback-Leibler distance based multivariate Gaussian feature adaptation for distant-talking speech recognition
    Pan, Y
    Waibel, A
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1029 - 1032
  • [40] Reverberant Speech Recognition Based on Denoising Autoencoder
    Ishii, Takaaki
    Komiyama, Hiroki
    Shinozaki, Takahiro
    Horiuchi, Yasuo
    Kuroiwa, Shingo
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3479 - 3483