CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments

被引:2
|
作者
Fukumori, Takahiro [1 ]
Nishiura, Takanobu [1 ]
Nakayama, Masato [2 ]
Denda, Yuki [3 ]
Kitaoka, Norihide [4 ]
Yamada, Takeshi [7 ]
Yamamoto, Kazumasa [8 ]
Tsuge, Satoru [9 ]
Fujimoto, Masakiyo [10 ]
Takiguchi, Tetsuya [11 ]
Miyajima, Chiyomi [5 ]
Tamura, Satoshi [12 ,13 ]
Ogawa, Tetsuji [14 ]
Matsuda, Shigeki [15 ]
Kuroiwa, Shingo [17 ,18 ]
Takeda, Kazuya [5 ,6 ]
Nakamura, Satoshi [15 ,16 ]
机构
[1] Ritsumeikan Univ, Kusatsu 5258577, Japan
[2] Kinki Univ, Kinokawa 6496493, Japan
[3] Murata Machinery Ltd, Kyoto 6128686, Japan
[4] Nagoya Univ, Grad Sch Informat Sci, Dept Media Sci, Nagoya, Aichi 4648603, Japan
[5] Nagoya Univ, Grad Sch Informat Sci, Nagoya, Aichi 4648603, Japan
[6] Nagoya Univ, Grad Sch, Nagoya, Aichi 4648603, Japan
[7] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 3058573, Japan
[8] Toyohashi Univ Technol, Dept Informat & Comp Sci, Toyohashi, Aichi 4418580, Japan
[9] Daido Univ, Sch Informat, Dept Informat Syst, Nagoya, Aichi 4578530, Japan
[10] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
[11] Kobe Univ, Kobe, Hyogo 6578501, Japan
[12] Gifu Univ, Dept Comp Sci, Gifu 5011193, Japan
[13] Gifu Univ, Gifu 5011193, Japan
[14] Waseda Univ, Tokyo 1698050, Japan
[15] Natl Inst Informat & Commun Technol, Kyoto 6190288, Japan
[16] Natl Inst Informat & Commun Technol, MASTAR Project, Kyoto 6190288, Japan
[17] Chiba Univ, Grad Sch Adv Integrat Sci, Chiba 2638522, Japan
[18] Natl Inst Informat & Commun Technol, Chiba 2638522, Japan
关键词
Reverberant speech database; Reverberant speech recognition; Various recording environments; Room impulse response; Evaluation framework;
D O I
10.1250/ast.32.201
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets'' and "extra data sets.'' The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple; however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies.
引用
收藏
页码:201 / 210
页数:10
相关论文
共 50 条
  • [41] Minimum Kullback-Leibler distance based multivariate Gaussian feature adaptation for distant-talking speech recognition
    Pan, Y
    Waibel, A
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1029 - 1032
  • [42] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Yuma Ueda
    Longbiao Wang
    Atsuhiko Kai
    Xiong Xiao
    Eng Siong Chng
    Haizhou Li
    Journal of Signal Processing Systems, 2016, 82 : 151 - 161
  • [43] Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 151 - 161
  • [44] Speech Emotion Recognition in Noisy and Reverberant Environments
    Heracleous, Panikos
    Yasuda, Keiji
    Sugaya, Fumiaki
    Yoneyama, Akio
    Hashimoto, Masayuki
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 262 - 266
  • [45] Survey on Approaches to Speech Recognition in Reverberant Environments
    Yoshioka, Takuya
    Sehr, Armin
    Delcroix, Marc
    Kinoshita, Keisuke
    Maas, Roland
    Nakatani, Tomohiro
    Kellermann, Walter
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [46] THE REVERB CHALLENGE: A COMMON EVALUATION FRAMEWORK FOR DEREVERBERATION AND RECOGNITION OF REVERBERANT SPEECH
    Kinoshita, Keisuke
    Delcroix, Marc
    Yoshioka, Takuya
    Nakatani, Tomohiro
    Habets, Emanuel
    Haeb-Umbach, Reinhold
    Leutnant, Volker
    Sehr, Armin
    Kellermann, Walter
    Maas, Roland
    Gannot, Sharon
    Raj, Bhiksha
    2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2013,
  • [47] Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
    Li, Dongbo
    Wang, Longbiao
    Dang, Jianwu
    Ge, Meng
    Guan, Haotian
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 394 - 398
  • [48] Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
    Nandwana, Mahesh Kumar
    van Hout, Julien
    McLaren, Mitchell
    Stauffer, Allen
    Richey, Colleen
    Lawson, Aaron
    Graciarena, Martin
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1106 - 1110
  • [49] Methods for Robust Speech Recognition in Reverberant Environments: A Comparison
    Petrick, Rico
    Feher, Thomas
    Unoki, Masashi
    Hoffmann, Ruediger
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 582 - +
  • [50] Acoustic diversity for improved speech recognition in reverberant environments
    Gillespie, BW
    Atlas, LE
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 557 - 560