Environment-dependent denoising autoencoder for distant-talking speech recognition

被引:0
|
作者
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Bo Ren
机构
[1] Shizuoka University,Graduate School of Engineering
[2] Nagaoka University of Technology,undefined
关键词
Speech recognition; Dereverberation; Denoising autoencoder; Environment identification; Distant-talking speech;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two-step environment-dependent DAE.
引用
收藏
相关论文
共 50 条
  • [21] Distant-talking Continuous Speech Recognition based on a novel Reverberation Model in the Feature Domain
    Sehr, Armin
    Zeller, Marcus
    Kellermann, Walter
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 769 - 772
  • [22] Distant-talking accent recognition by combining GMM and DNN
    Khomdet Phapatanaburi
    Longbiao Wang
    Ryota Sakagami
    Zhaofeng Zhang
    Ximin Li
    Masahiro Iwahashi
    Multimedia Tools and Applications, 2016, 75 : 5109 - 5124
  • [23] Phase and reverberation aware DNN for distant-talking speech enhancement
    Oo, Zeyan
    Wang, Longbiao
    Phapatanaburi, Khomdet
    Iwahashi, Masahiro
    Nakagawa, Seiichi
    Dang, Jianwu
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (14) : 18865 - 18880
  • [24] Speech intelligibility under in-car distant-talking environments
    Mizumachi, Mitsunori
    Takuma, Shota
    Ohsugi, Ikuyo
    Hamada, Yasushi
    Nishi, Koichi
    Proceedings of the INTER-NOISE 2016 - 45th International Congress and Exposition on Noise Control Engineering: Towards a Quieter Future, 2016, : 389 - 393
  • [25] Investigations into Early and Late Reflections on Distant-Talking Speech Recognition Toward Suitable Reverberation Criteria
    Nishiura, Takanobu
    Hirano, Yoshiki
    Denda, Yuki
    Nakayama, Masato
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1369 - 1372
  • [26] Phase and reverberation aware DNN for distant-talking speech enhancement
    Zeyan Oo
    Longbiao Wang
    Khomdet Phapatanaburi
    Masahiro Iwahashi
    Seiichi Nakagawa
    Jianwu Dang
    Multimedia Tools and Applications, 2018, 77 : 18865 - 18880
  • [27] Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
    Wang, Longbiao
    Kitaoka, Norihide
    Nakagawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03): : 659 - 667
  • [28] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
    Zhang, Zhaofeng
    Wang, Longbiao
    Kai, Atsuhiko
    Yamada, Takanori
    Li, Weifeng
    Iwahashi, Masahiro
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,
  • [29] Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition
    Sehr, Armin
    Maas, Roland
    Kellermann, Walter
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1676 - 1691
  • [30] JOINT SPARSE REPRESENTATION BASED CEPSTRAL-DOMAIN DEREVERBERATION FOR DISTANT-TALKING SPEECH RECOGNITION
    Li, Weifeng
    Wang, Longbiao
    Zhou, Fei
    Liao, Qingmin
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7117 - 7120