Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

被引：0

作者：

Wang, Longbiao ^{[1
]}

Ren, Bo ^{[1
]}

Ueda, Yuma ^{[2
]}

Kai, Atsuhiko ^{[2
]}

Teraoka, Shunta ^{[2
]}

Fukushima, Taku ^{[2
]}

机构：

[1] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan

[2] Shizuoka Univ, Hamamatsu, Shizuoka 4328561, Japan

来源：

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年

关键词：

POSITION-DEPENDENT CMN; SPECTRAL SUBTRACTION;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.

引用

页数：5

共 50 条

[31] Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments - Newest Part of the CENSREC Series -
Nishiura, Takanobu
Nakayama, Masato
Denda, Yuki
Kitaoka, Norihide
Yamamoto, Kazumasa
Yamada, Takeshi
Tsuge, Satoru
Miyajima, Chiyomi
Fujimoto, Masakiyo
Takiguchi, Tetsuya
Tamura, Satoshi
Kuroiwa, Shingo
Takeda, Kazuya
Nakamura, Satoshi
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1828 - 1834
[32] CENSREC-4: Development of Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments
Nakayama, Masato
Nishiura, Takanobu
Denda, Yuki
Kitaoka, Norihide
Yamamoto, Kazumasa
Yamada, Takeshi
Tsuge, Satoru
Miyajima, Chiyomi
Fujimoto, Masakiyo
Takiguchi, Tetsuya
Tamura, Satoshi
Ogawa, Tetsuji
Matsuda, Shigeki
Kuroiwa, Shingo
Takeda, Kazuya
Nakamura, Satoshi
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 968 - +
[33] Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array
Yamada, T
Nakamura, S
Shikano, K
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (02): : 48 - 56
[34] A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition
Tang, Hao
Hsu, Wei-Ning
Grondin, Francois
Glass, James
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2928 - 2932
[35] Speech selection and environmental adaptation for asynchronous speech recognition
Ren, Bo
Wang, Longbiao
Kai, Atsuhiko
Zhang, Zhaofeng
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 119 - 124
[36] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
Nakamura, S
Heracleous, P
FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
[37] Whispered speech recognition using deep denoising autoencoder
Grozdic, Dorde T.
Jovicic, Slobodan T.
Subotic, Misko
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 59 : 15 - 22
[38] Music Removal by Convolutional Denoising Autoencoder in Speech Recognition
Zhao, Mengyuan
Wang, Dong
Zhang, Zhiyong
Zhang, Xuewei
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 338 - 341
[39] Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction
Shiota, Satoshi
Wang, Longbiao
Odani, Kyohei
Kai, Atsuhiko
Li, Weifeng
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 384 - +
[40] Deep learning based distant-talking speech processing in real-world sound environments
Araki, Shoko
Fujimoto, Masakiyo
Yoshioka, Takuya
Delcroix, Marc
Espi, Miquel
Nakatani, Tomohiro
NTT Technical Review, 2015, 13 (11):

← 1 2 3 4 5 →