Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

被引:0
|
作者
Wang, Longbiao [1 ]
Ren, Bo [1 ]
Ueda, Yuma [2 ]
Kai, Atsuhiko [2 ]
Teraoka, Shunta [2 ]
Fukushima, Taku [2 ]
机构
[1] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan
[2] Shizuoka Univ, Hamamatsu, Shizuoka 4328561, Japan
关键词
POSITION-DEPENDENT CMN; SPECTRAL SUBTRACTION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments - Newest Part of the CENSREC Series -
    Nishiura, Takanobu
    Nakayama, Masato
    Denda, Yuki
    Kitaoka, Norihide
    Yamamoto, Kazumasa
    Yamada, Takeshi
    Tsuge, Satoru
    Miyajima, Chiyomi
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Tamura, Satoshi
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1828 - 1834
  • [32] CENSREC-4: Development of Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments
    Nakayama, Masato
    Nishiura, Takanobu
    Denda, Yuki
    Kitaoka, Norihide
    Yamamoto, Kazumasa
    Yamada, Takeshi
    Tsuge, Satoru
    Miyajima, Chiyomi
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Tamura, Satoshi
    Ogawa, Tetsuji
    Matsuda, Shigeki
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 968 - +
  • [33] Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array
    Yamada, T
    Nakamura, S
    Shikano, K
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (02): : 48 - 56
  • [34] A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition
    Tang, Hao
    Hsu, Wei-Ning
    Grondin, Francois
    Glass, James
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2928 - 2932
  • [35] Speech selection and environmental adaptation for asynchronous speech recognition
    Ren, Bo
    Wang, Longbiao
    Kai, Atsuhiko
    Zhang, Zhaofeng
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 119 - 124
  • [36] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
    Nakamura, S
    Heracleous, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
  • [37] Whispered speech recognition using deep denoising autoencoder
    Grozdic, Dorde T.
    Jovicic, Slobodan T.
    Subotic, Misko
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 59 : 15 - 22
  • [38] Music Removal by Convolutional Denoising Autoencoder in Speech Recognition
    Zhao, Mengyuan
    Wang, Dong
    Zhang, Zhiyong
    Zhang, Xuewei
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 338 - 341
  • [39] Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction
    Shiota, Satoshi
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    Li, Weifeng
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 384 - +
  • [40] Deep learning based distant-talking speech processing in real-world sound environments
    Araki, Shoko
    Fujimoto, Masakiyo
    Yoshioka, Takuya
    Delcroix, Marc
    Espi, Miquel
    Nakatani, Tomohiro
    NTT Technical Review, 2015, 13 (11):