CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments

被引：2

作者：

Fukumori, Takahiro ^{[1
]}

Nishiura, Takanobu ^{[1
]}

Nakayama, Masato ^{[2
]}

Denda, Yuki ^{[3
]}

Kitaoka, Norihide ^{[4
]}

Yamada, Takeshi ^{[7
]}

Yamamoto, Kazumasa ^{[8
]}

Tsuge, Satoru ^{[9
]}

Fujimoto, Masakiyo ^{[10
]}

Takiguchi, Tetsuya ^{[11
]}

Miyajima, Chiyomi ^{[5
]}

Tamura, Satoshi ^{[12
,13
]}

Ogawa, Tetsuji ^{[14
]}

Matsuda, Shigeki ^{[15
]}

Kuroiwa, Shingo ^{[17
,18
]}

Takeda, Kazuya ^{[5
,6
]}

Nakamura, Satoshi ^{[15
,16
]}

机构：

[1] Ritsumeikan Univ, Kusatsu 5258577, Japan

[2] Kinki Univ, Kinokawa 6496493, Japan

[3] Murata Machinery Ltd, Kyoto 6128686, Japan

[4] Nagoya Univ, Grad Sch Informat Sci, Dept Media Sci, Nagoya, Aichi 4648603, Japan

[5] Nagoya Univ, Grad Sch Informat Sci, Nagoya, Aichi 4648603, Japan

[6] Nagoya Univ, Grad Sch, Nagoya, Aichi 4648603, Japan

[7] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 3058573, Japan

[8] Toyohashi Univ Technol, Dept Informat & Comp Sci, Toyohashi, Aichi 4418580, Japan

[9] Daido Univ, Sch Informat, Dept Informat Syst, Nagoya, Aichi 4578530, Japan

[10] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan

[11] Kobe Univ, Kobe, Hyogo 6578501, Japan

[12] Gifu Univ, Dept Comp Sci, Gifu 5011193, Japan

[13] Gifu Univ, Gifu 5011193, Japan

[14] Waseda Univ, Tokyo 1698050, Japan

[15] Natl Inst Informat & Commun Technol, Kyoto 6190288, Japan

[16] Natl Inst Informat & Commun Technol, MASTAR Project, Kyoto 6190288, Japan

[17] Chiba Univ, Grad Sch Adv Integrat Sci, Chiba 2638522, Japan

[18] Natl Inst Informat & Commun Technol, Chiba 2638522, Japan

来源：

ACOUSTICAL SCIENCE AND TECHNOLOGY | 2011年 / 32卷 / 05期

关键词：

Reverberant speech database; Reverberant speech recognition; Various recording environments; Room impulse response; Evaluation framework;

D O I：

10.1250/ast.32.201

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets'' and "extra data sets.'' The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple; however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies.

引用

页码：201 / 210

页数：10

共 50 条

[41] Minimum Kullback-Leibler distance based multivariate Gaussian feature adaptation for distant-talking speech recognition
Pan, Y
Waibel, A
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1029 - 1032
[42] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Yuma Ueda
Longbiao Wang
Atsuhiko Kai
Xiong Xiao
Eng Siong Chng
Haizhou Li
Journal of Signal Processing Systems, 2016, 82 : 151 - 161
[43] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
Ueda, Yuma
Wang, Longbiao
Kai, Atsuhiko
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
[44] Speech Emotion Recognition in Noisy and Reverberant Environments
Heracleous, Panikos
Yasuda, Keiji
Sugaya, Fumiaki
Yoneyama, Akio
Hashimoto, Masayuki
2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 262 - 266
[45] Survey on Approaches to Speech Recognition in Reverberant Environments
Yoshioka, Takuya
Sehr, Armin
Delcroix, Marc
Kinoshita, Keisuke
Maas, Roland
Nakatani, Tomohiro
Kellermann, Walter
2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[46] THE REVERB CHALLENGE: A COMMON EVALUATION FRAMEWORK FOR DEREVERBERATION AND RECOGNITION OF REVERBERANT SPEECH
Kinoshita, Keisuke
Delcroix, Marc
Yoshioka, Takuya
Nakatani, Tomohiro
Habets, Emanuel
Haeb-Umbach, Reinhold
Leutnant, Volker
Sehr, Armin
Kellermann, Walter
Maas, Roland
Gannot, Sharon
Raj, Bhiksha
2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2013,
[47] Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
Li, Dongbo
Wang, Longbiao
Dang, Jianwu
Ge, Meng
Guan, Haotian
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 394 - 398
[48] Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Nandwana, Mahesh Kumar
van Hout, Julien
McLaren, Mitchell
Stauffer, Allen
Richey, Colleen
Lawson, Aaron
Graciarena, Martin
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1106 - 1110
[49] Methods for Robust Speech Recognition in Reverberant Environments: A Comparison
Petrick, Rico
Feher, Thomas
Unoki, Masashi
Hoffmann, Ruediger
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 582 - +
[50] Acoustic diversity for improved speech recognition in reverberant environments
Gillespie, BW
Atlas, LE
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 557 - 560

← 1 2 3 4 5 →