Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

被引:0
|
作者
Wang, Longbiao [1 ]
Ren, Bo [1 ]
Ueda, Yuma [2 ]
Kai, Atsuhiko [2 ]
Teraoka, Shunta [2 ]
Fukushima, Taku [2 ]
机构
[1] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan
[2] Shizuoka Univ, Hamamatsu, Shizuoka 4328561, Japan
关键词
POSITION-DEPENDENT CMN; SPECTRAL SUBTRACTION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] A HIGHLY EFFICIENT OPTIMIZATION SCHEME FOR REMOS-BASED DISTANT-TALKING SPEECH RECOGNITION
    Maas, Roland
    Sehr, Armin
    Gugat, Martin
    Kellermann, Walter
    18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 1983 - 1987
  • [22] Phase and reverberation aware DNN for distant-talking speech enhancement
    Zeyan Oo
    Longbiao Wang
    Khomdet Phapatanaburi
    Masahiro Iwahashi
    Seiichi Nakagawa
    Jianwu Dang
    Multimedia Tools and Applications, 2018, 77 : 18865 - 18880
  • [23] Multi-party Human-Robot Interaction with Distant-Talking Speech Recognition
    Gomez, Randy
    Kawahara, Tatsuya
    Nakamura, Keisuke
    Nakadai, Kazuhiro
    HRI'12: PROCEEDINGS OF THE SEVENTH ANNUAL ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2012, : 439 - 446
  • [24] Distant-talking Continuous Speech Recognition based on a novel Reverberation Model in the Feature Domain
    Sehr, Armin
    Zeller, Marcus
    Kellermann, Walter
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 769 - 772
  • [25] Investigations into Early and Late Reflections on Distant-Talking Speech Recognition Toward Suitable Reverberation Criteria
    Nishiura, Takanobu
    Hirano, Yoshiki
    Denda, Yuki
    Nakayama, Masato
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1369 - 1372
  • [26] Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
    Wang, Longbiao
    Kitaoka, Norihide
    Nakagawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03): : 659 - 667
  • [27] Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition
    Sehr, Armin
    Maas, Roland
    Kellermann, Walter
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1676 - 1691
  • [28] JOINT SPARSE REPRESENTATION BASED CEPSTRAL-DOMAIN DEREVERBERATION FOR DISTANT-TALKING SPEECH RECOGNITION
    Li, Weifeng
    Wang, Longbiao
    Zhou, Fei
    Liao, Qingmin
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7117 - 7120
  • [29] Distant-talking robust speech recognition using late reflection components of room impulse response
    Gomez, Randy
    Even, Jani
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4581 - 4584
  • [30] Reverberant Speech Recognition Based on Denoising Autoencoder
    Ishii, Takaaki
    Komiyama, Hiroki
    Shinozaki, Takahiro
    Horiuchi, Yasuo
    Kuroiwa, Shingo
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3479 - 3483