Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments - Newest Part of the CENSREC Series -

被引:0
|
作者
Nishiura, Takanobu
Nakayama, Masato
Denda, Yuki
Kitaoka, Norihide
Yamamoto, Kazumasa
Yamada, Takeshi
Tsuge, Satoru
Miyajima, Chiyomi
Fujimoto, Masakiyo
Takiguchi, Tetsuya
Tamura, Satoshi
Kuroiwa, Shingo
Takeda, Kazuya
Nakamura, Satoshi
机构
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Recently, speech recognition performance has been drastically improved by statistical methods and huge speech databases. Now performance improvement under such realistic environments as noisy conditions is being focused on. Since October 2001, we from the working group of the Information Processing Society in Japan have been working on evaluation methodologies and frameworks for Japanese noisy speech recognition. We have released frameworks including databases and evaluation tools called CENSREC-1 (Corpus and Environment for Noisy Speech RECognition 1; formerly AURORA-2J), CENSREC-2 (in-car connected digits recognition), CENSREC-3 (in-car isolated word recognition), and CENSREC-1-C (voice activity detection under noisy conditions). In this paper, we newly introduce a collection of databases and evaluation tools named CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a hands-free speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition. The results of evaluation experiments proved that CENSREC-4 is an effective database suitable for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. The framework was released in March 2008, and many studies are being conducted with it in Japan.
引用
收藏
页码:1828 / 1834
页数:7
相关论文
共 47 条
  • [31] Distant-talking speech recognition with microphone-array sound pickup and NN/MLLR environment equalization
    Lin, QG
    Flanagan, J
    Che, CW
    PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, : 1099 - 1102
  • [32] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
    Nakamura, S
    Heracleous, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
  • [33] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
  • [34] CENSREC2: Corpus and Evaluation Environments for In Car Continuous Digit Speech Recognition
    Nakamura, Satoshi
    Fujimoto, Masakiyo
    Takeda, Kazuya
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2330 - +
  • [35] Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction
    Shiota, Satoshi
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    Li, Weifeng
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 384 - +
  • [36] Minimum Kullback-Leibler distance based multivariate Gaussian feature adaptation for distant-talking speech recognition
    Pan, Y
    Waibel, A
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1029 - 1032
  • [37] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Yuma Ueda
    Longbiao Wang
    Atsuhiko Kai
    Xiong Xiao
    Eng Siong Chng
    Haizhou Li
    Journal of Signal Processing Systems, 2016, 82 : 151 - 161
  • [38] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
  • [39] THE REVERB CHALLENGE: A COMMON EVALUATION FRAMEWORK FOR DEREVERBERATION AND RECOGNITION OF REVERBERANT SPEECH
    Kinoshita, Keisuke
    Delcroix, Marc
    Yoshioka, Takuya
    Nakatani, Tomohiro
    Habets, Emanuel
    Haeb-Umbach, Reinhold
    Leutnant, Volker
    Sehr, Armin
    Kellermann, Walter
    Maas, Roland
    Gannot, Sharon
    Raj, Bhiksha
    2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2013,
  • [40] Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
    Li, Dongbo
    Wang, Longbiao
    Dang, Jianwu
    Ge, Meng
    Guan, Haotian
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 394 - 398