CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments

被引：2

作者：

Fukumori, Takahiro ^{[1
]}

Nishiura, Takanobu ^{[1
]}

Nakayama, Masato ^{[2
]}

Denda, Yuki ^{[3
]}

Kitaoka, Norihide ^{[4
]}

Yamada, Takeshi ^{[7
]}

Yamamoto, Kazumasa ^{[8
]}

Tsuge, Satoru ^{[9
]}

Fujimoto, Masakiyo ^{[10
]}

Takiguchi, Tetsuya ^{[11
]}

Miyajima, Chiyomi ^{[5
]}

Tamura, Satoshi ^{[12
,13
]}

Ogawa, Tetsuji ^{[14
]}

Matsuda, Shigeki ^{[15
]}

Kuroiwa, Shingo ^{[17
,18
]}

Takeda, Kazuya ^{[5
,6
]}

Nakamura, Satoshi ^{[15
,16
]}

机构：

[1] Ritsumeikan Univ, Kusatsu 5258577, Japan

[2] Kinki Univ, Kinokawa 6496493, Japan

[3] Murata Machinery Ltd, Kyoto 6128686, Japan

[4] Nagoya Univ, Grad Sch Informat Sci, Dept Media Sci, Nagoya, Aichi 4648603, Japan

[5] Nagoya Univ, Grad Sch Informat Sci, Nagoya, Aichi 4648603, Japan

[6] Nagoya Univ, Grad Sch, Nagoya, Aichi 4648603, Japan

[7] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 3058573, Japan

[8] Toyohashi Univ Technol, Dept Informat & Comp Sci, Toyohashi, Aichi 4418580, Japan

[9] Daido Univ, Sch Informat, Dept Informat Syst, Nagoya, Aichi 4578530, Japan

[10] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan

[11] Kobe Univ, Kobe, Hyogo 6578501, Japan

[12] Gifu Univ, Dept Comp Sci, Gifu 5011193, Japan

[13] Gifu Univ, Gifu 5011193, Japan

[14] Waseda Univ, Tokyo 1698050, Japan

[15] Natl Inst Informat & Commun Technol, Kyoto 6190288, Japan

[16] Natl Inst Informat & Commun Technol, MASTAR Project, Kyoto 6190288, Japan

[17] Chiba Univ, Grad Sch Adv Integrat Sci, Chiba 2638522, Japan

[18] Natl Inst Informat & Commun Technol, Chiba 2638522, Japan

来源：

ACOUSTICAL SCIENCE AND TECHNOLOGY | 2011年 / 32卷 / 05期

关键词：

Reverberant speech database; Reverberant speech recognition; Various recording environments; Room impulse response; Evaluation framework;

D O I：

10.1250/ast.32.201

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets'' and "extra data sets.'' The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple; however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies.

引用

页码：201 / 210

页数：10

共 50 条

[21] A HIGHLY EFFICIENT OPTIMIZATION SCHEME FOR REMOS-BASED DISTANT-TALKING SPEECH RECOGNITION
Maas, Roland
Sehr, Armin
Gugat, Martin
Kellermann, Walter
18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 1983 - 1987
[22] Multi-party Human-Robot Interaction with Distant-Talking Speech Recognition
Gomez, Randy
Kawahara, Tatsuya
Nakamura, Keisuke
Nakadai, Kazuhiro
HRI'12: PROCEEDINGS OF THE SEVENTH ANNUAL ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2012, : 439 - 446
[23] Distant-talking Continuous Speech Recognition based on a novel Reverberation Model in the Feature Domain
Sehr, Armin
Zeller, Marcus
Kellermann, Walter
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 769 - 772
[24] Distant-talking accent recognition by combining GMM and DNN
Khomdet Phapatanaburi
Longbiao Wang
Ryota Sakagami
Zhaofeng Zhang
Ximin Li
Masahiro Iwahashi
Multimedia Tools and Applications, 2016, 75 : 5109 - 5124
[25] Phase and reverberation aware DNN for distant-talking speech enhancement
Oo, Zeyan
Wang, Longbiao
Phapatanaburi, Khomdet
Iwahashi, Masahiro
Nakagawa, Seiichi
Dang, Jianwu
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (14) : 18865 - 18880
[26] CENSREC-3: An evaluation framework for Japanese speech recognition in real car-driving environments
Fujimoto, Masakiyo
Takeda, Kazuya
Nakamura, Satoshi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (11) : 2783 - 2793
[27] Investigations into Early and Late Reflections on Distant-Talking Speech Recognition Toward Suitable Reverberation Criteria
Nishiura, Takanobu
Hirano, Yoshiki
Denda, Yuki
Nakayama, Masato
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1369 - 1372
[28] Phase and reverberation aware DNN for distant-talking speech enhancement
Zeyan Oo
Longbiao Wang
Khomdet Phapatanaburi
Masahiro Iwahashi
Seiichi Nakagawa
Jianwu Dang
Multimedia Tools and Applications, 2018, 77 : 18865 - 18880
[29] Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
Wang, Longbiao
Kitaoka, Norihide
Nakagawa, Seiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (03): : 659 - 667
[30] Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition
Sehr, Armin
Maas, Roland
Kellermann, Walter
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1676 - 1691

← 1 2 3 4 5 →