Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

被引:0
|
作者
Wang, Longbiao [1 ]
Ren, Bo [1 ]
Ueda, Yuma [2 ]
Kai, Atsuhiko [2 ]
Teraoka, Shunta [2 ]
Fukushima, Taku [2 ]
机构
[1] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan
[2] Shizuoka Univ, Hamamatsu, Shizuoka 4328561, Japan
关键词
POSITION-DEPENDENT CMN; SPECTRAL SUBTRACTION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
    Li, Dongbo
    Wang, Longbiao
    Dang, Jianwu
    Ge, Meng
    Guan, Haotian
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 394 - 398
  • [42] Distant-talking accent recognition by combining GMM and DNN
    Phapatanaburi, Khomdet
    Wang, Longbiao
    Sakagami, Ryota
    Zhang, Zhaofeng
    Li, Ximin
    Iwahashi, Masahiro
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (09) : 5109 - 5124
  • [43] Denoising Convolutional Autoencoder Based Approach for Disordered Speech Recognition
    Chandrakala, S.
    Vishnika, Veni S.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2024, 33 (01)
  • [44] Group Delay Based Methods for Recognition of Distant talking Speech
    Mandala, Rohan
    Shukla, Mrityunjaya
    Hegde, Rajesh
    2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 1702 - 1706
  • [45] Distant-talking accent recognition by combining GMM and DNN
    Khomdet Phapatanaburi
    Longbiao Wang
    Ryota Sakagami
    Zhaofeng Zhang
    Ximin Li
    Masahiro Iwahashi
    Multimedia Tools and Applications, 2016, 75 : 5109 - 5124
  • [46] Effective Acoustic Adaptation for A Distant-talking Interactive TV System
    Huang, Jing
    Epstein, Mark
    Matassoni, Marco
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1709 - +
  • [47] Rapid environment adaptation for speech recognition
    Takagi, Keizaburo, 1600, Maruzen Co, Tokyo, Japan (16):
  • [48] Simultaneous recognition of distant-talking speech of multiple talkers based on the 3-D N-best search method
    Heracleous, P
    Nakamura, S
    Shikano, K
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 105 - 116
  • [49] Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method
    Panikos Heracleous
    Satoshi Nakamura
    Kiyohiro Shikano
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 105 - 116
  • [50] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
    Zhang, Zhaofeng
    Wang, Longbiao
    Kai, Atsuhiko
    Yamada, Takanori
    Li, Weifeng
    Iwahashi, Masahiro
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,