Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

被引：0

作者：

Wang, Longbiao ^{[1
]}

Ren, Bo ^{[1
]}

Ueda, Yuma ^{[2
]}

Kai, Atsuhiko ^{[2
]}

Teraoka, Shunta ^{[2
]}

Fukushima, Taku ^{[2
]}

机构：

[1] Nagaoka Univ Technol, Nagaoka, Niigata 9402188, Japan

[2] Shizuoka Univ, Hamamatsu, Shizuoka 4328561, Japan

来源：

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2014年

关键词：

POSITION-DEPENDENT CMN; SPECTRAL SUBTRACTION;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.

引用

页数：5

共 50 条

[41] Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
Li, Dongbo
Wang, Longbiao
Dang, Jianwu
Ge, Meng
Guan, Haotian
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 394 - 398
[42] Distant-talking accent recognition by combining GMM and DNN
Phapatanaburi, Khomdet
Wang, Longbiao
Sakagami, Ryota
Zhang, Zhaofeng
Li, Ximin
Iwahashi, Masahiro
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (09) : 5109 - 5124
[43] Denoising Convolutional Autoencoder Based Approach for Disordered Speech Recognition
Chandrakala, S.
Vishnika, Veni S.
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2024, 33 (01)
[44] Group Delay Based Methods for Recognition of Distant talking Speech
Mandala, Rohan
Shukla, Mrityunjaya
Hegde, Rajesh
2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 1702 - 1706
[45] Distant-talking accent recognition by combining GMM and DNN
Khomdet Phapatanaburi
Longbiao Wang
Ryota Sakagami
Zhaofeng Zhang
Ximin Li
Masahiro Iwahashi
Multimedia Tools and Applications, 2016, 75 : 5109 - 5124
[46] Effective Acoustic Adaptation for A Distant-talking Interactive TV System
Huang, Jing
Epstein, Mark
Matassoni, Marco
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1709 - +
[47] Rapid environment adaptation for speech recognition
Takagi, Keizaburo, 1600, Maruzen Co, Tokyo, Japan (16):
[48] Simultaneous recognition of distant-talking speech of multiple talkers based on the 3-D N-best search method
Heracleous, P
Nakamura, S
Shikano, K
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 105 - 116
[49] Simultaneous Recognition of Distant-Talking Speech of Multiple Talkers Based on the 3-D N-Best Search Method
Panikos Heracleous
Satoshi Nakamura
Kiyohiro Shikano
Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 105 - 116
[50] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification
Zhang, Zhaofeng
Wang, Longbiao
Kai, Atsuhiko
Yamada, Takanori
Li, Weifeng
Iwahashi, Masahiro
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,

← 1 2 3 4 5 →