Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

被引：0

作者：

Wang, Longbiao ^{[1
]}

Ren, Bo ^{[1
]}

Ueda, Yuma ^{[2
]}

Kai, Atsuhiko ^{[2
]}

Teraoka, Shunta ^{[2
]}

Fukushima, Taku ^{[2
]}

机构：

[1] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan

[2] Shizuoka Univ, Hamamatsu, Shizuoka 4328561, Japan

来源：

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年

关键词：

POSITION-DEPENDENT CMN; SPECTRAL SUBTRACTION;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.

引用

页数：5

共 50 条

[21] A HIGHLY EFFICIENT OPTIMIZATION SCHEME FOR REMOS-BASED DISTANT-TALKING SPEECH RECOGNITION
Maas, Roland
Sehr, Armin
Gugat, Martin
Kellermann, Walter
18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 1983 - 1987
[22] Phase and reverberation aware DNN for distant-talking speech enhancement
Zeyan Oo
Longbiao Wang
Khomdet Phapatanaburi
Masahiro Iwahashi
Seiichi Nakagawa
Jianwu Dang
Multimedia Tools and Applications, 2018, 77 : 18865 - 18880
[23] Multi-party Human-Robot Interaction with Distant-Talking Speech Recognition
Gomez, Randy
Kawahara, Tatsuya
Nakamura, Keisuke
Nakadai, Kazuhiro
HRI'12: PROCEEDINGS OF THE SEVENTH ANNUAL ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2012, : 439 - 446
[24] Distant-talking Continuous Speech Recognition based on a novel Reverberation Model in the Feature Domain
Sehr, Armin
Zeller, Marcus
Kellermann, Walter
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 769 - 772
[25] Investigations into Early and Late Reflections on Distant-Talking Speech Recognition Toward Suitable Reverberation Criteria
Nishiura, Takanobu
Hirano, Yoshiki
Denda, Yuki
Nakayama, Masato
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1369 - 1372
[26] Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
Wang, Longbiao
Kitaoka, Norihide
Nakagawa, Seiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03): : 659 - 667
[27] Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition
Sehr, Armin
Maas, Roland
Kellermann, Walter
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1676 - 1691
[28] JOINT SPARSE REPRESENTATION BASED CEPSTRAL-DOMAIN DEREVERBERATION FOR DISTANT-TALKING SPEECH RECOGNITION
Li, Weifeng
Wang, Longbiao
Zhou, Fei
Liao, Qingmin
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7117 - 7120
[29] Distant-talking robust speech recognition using late reflection components of room impulse response
Gomez, Randy
Even, Jani
Saruwatari, Hiroshi
Shikano, Kiyohiro
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4581 - 4584
[30] Reverberant Speech Recognition Based on Denoising Autoencoder
Ishii, Takaaki
Komiyama, Hiroki
Shinozaki, Takahiro
Horiuchi, Yasuo
Kuroiwa, Shingo
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3479 - 3483

← 1 2 3 4 5 →