Environment-dependent denoising autoencoder for distant-talking speech recognition

被引:13
|
作者
Ueda, Yuma [1 ]
Wang, Longbiao [2 ]
Kai, Atsuhiko [1 ]
Ren, Bo [2 ]
机构
[1] Shizuoka Univ, Grad Sch Engn, Naka Ku, Hamamatsu, Shizuoka 4328561, Japan
[2] Nagaoka Univ Technol, 1603-1 Kamitomioka, Nagaoka, Niigata 9402188, Japan
关键词
Speech recognition; Dereverberation; Denoising autoencoder; Environment identification; Distant-talking speech; SPECTRAL SUBTRACTION; DEREVERBERATION; MODEL; REVERBERATION; ENHANCEMENT; DOMAIN; NOISE;
D O I
10.1186/s13634-015-0278-y
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose an environment-dependent denoising autoencoder (DAE) and automatic environment identification based on a deep neural network (DNN) with blind reverberation estimation for robust distant-talking speech recognition. Recently, DAEs have been shown to be effective in many noise reduction and reverberation suppression applications because higher-level representations and increased flexibility of the feature mapping function can be learned. However, a DAE is not adequate in mismatched training and test environments. In a conventional DAE, parameters are trained using pairs of reverberant speech and clean speech under various acoustic conditions (that is, an environment-independent DAE). To address the above problem, we propose two environment-dependent DAEs to reduce the influence of mismatches between training and test environments. In the first approach, we train various DAEs using speech from different acoustic environments, and the DAE for the condition that best matches the test condition is automatically selected (that is, a two-step environment-dependent DAE). To improve environment identification performance, we propose a DNN that uses both reverberant speech and estimated reverberation. In the second approach, we add estimated reverberation features to the input of the DAE (that is, a one-step environment-dependent DAE or a reverberation-aware DAE). The proposed method is evaluated using speech in simulated and real reverberant environments. Experimental results show that the environment-dependent DAE outperforms the environment-independent one in both simulated and real reverberant environments. For two-step environment-dependent DAE, the performance of environment identification based on the proposed DNN approach is also better than that of the conventional DNN approach, in which only reverberant speech is used and reverberation is not blindly estimated. And, the one-step environment-dependent DAE significantly outperforms the two-step environment-dependent DAE.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
    Li, Dongbo
    Wang, Longbiao
    Dang, Jianwu
    Ge, Meng
    Guan, Haotian
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 394 - 398
  • [42] Whispered speech recognition using deep denoising autoencoder
    Grozdic, Dorde T.
    Jovicic, Slobodan T.
    Subotic, Misko
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 59 : 15 - 22
  • [43] Music Removal by Convolutional Denoising Autoencoder in Speech Recognition
    Zhao, Mengyuan
    Wang, Dong
    Zhang, Zhiyong
    Zhang, Xuewei
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 338 - 341
  • [44] Deep learning based distant-talking speech processing in real-world sound environments
    Araki, Shoko
    Fujimoto, Masakiyo
    Yoshioka, Takuya
    Delcroix, Marc
    Espi, Miquel
    Nakatani, Tomohiro
    NTT Technical Review, 2015, 13 (11):
  • [45] Text-independent speaker identification in a distant-talking multi-microphone environment
    Ji, Mikyong
    Kim, Sungtak
    Kim, Hoirin
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (11) : 1892 - 1895
  • [46] Simultaneous recognition of distant-talking speech of multiple talkers based on the 3-D N-best search method
    Heracleous, P
    Nakamura, S
    Shikano, K
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 105 - 116
  • [47] Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method
    Panikos Heracleous
    Satoshi Nakamura
    Kiyohiro Shikano
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 105 - 116
  • [48] A reverberation robust target speech detection method using dual-microphone in distant-talking scene
    Wang, Xiaofei
    Guo, Yanmeng
    Wu, Chao
    Fu, Qiang
    Yan, Yonghong
    SPEECH COMMUNICATION, 2015, 72 : 47 - 58
  • [49] Denoising Convolutional Autoencoder Based Approach for Disordered Speech Recognition
    Chandrakala, S.
    Vishnika, Veni S.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2024, 33 (01)
  • [50] Simultaneous recognition of distant-talking speech of multiple sound sources based on 3-D N-best search algorithm
    Heracleous, P
    Nakamura, S
    Shikano, K
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 111 - 114