CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments

被引:2
|
作者
Fukumori, Takahiro [1 ]
Nishiura, Takanobu [1 ]
Nakayama, Masato [2 ]
Denda, Yuki [3 ]
Kitaoka, Norihide [4 ]
Yamada, Takeshi [7 ]
Yamamoto, Kazumasa [8 ]
Tsuge, Satoru [9 ]
Fujimoto, Masakiyo [10 ]
Takiguchi, Tetsuya [11 ]
Miyajima, Chiyomi [5 ]
Tamura, Satoshi [12 ,13 ]
Ogawa, Tetsuji [14 ]
Matsuda, Shigeki [15 ]
Kuroiwa, Shingo [17 ,18 ]
Takeda, Kazuya [5 ,6 ]
Nakamura, Satoshi [15 ,16 ]
机构
[1] Ritsumeikan Univ, Kusatsu 5258577, Japan
[2] Kinki Univ, Kinokawa 6496493, Japan
[3] Murata Machinery Ltd, Kyoto 6128686, Japan
[4] Nagoya Univ, Grad Sch Informat Sci, Dept Media Sci, Nagoya, Aichi 4648603, Japan
[5] Nagoya Univ, Grad Sch Informat Sci, Nagoya, Aichi 4648603, Japan
[6] Nagoya Univ, Grad Sch, Nagoya, Aichi 4648603, Japan
[7] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 3058573, Japan
[8] Toyohashi Univ Technol, Dept Informat & Comp Sci, Toyohashi, Aichi 4418580, Japan
[9] Daido Univ, Sch Informat, Dept Informat Syst, Nagoya, Aichi 4578530, Japan
[10] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
[11] Kobe Univ, Kobe, Hyogo 6578501, Japan
[12] Gifu Univ, Dept Comp Sci, Gifu 5011193, Japan
[13] Gifu Univ, Gifu 5011193, Japan
[14] Waseda Univ, Tokyo 1698050, Japan
[15] Natl Inst Informat & Commun Technol, Kyoto 6190288, Japan
[16] Natl Inst Informat & Commun Technol, MASTAR Project, Kyoto 6190288, Japan
[17] Chiba Univ, Grad Sch Adv Integrat Sci, Chiba 2638522, Japan
[18] Natl Inst Informat & Commun Technol, Chiba 2638522, Japan
关键词
Reverberant speech database; Reverberant speech recognition; Various recording environments; Room impulse response; Evaluation framework;
D O I
10.1250/ast.32.201
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets'' and "extra data sets.'' The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple; however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies.
引用
收藏
页码:201 / 210
页数:10
相关论文
共 50 条
  • [1] CENSREC-4: Development of Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments
    Nakayama, Masato
    Nishiura, Takanobu
    Denda, Yuki
    Kitaoka, Norihide
    Yamamoto, Kazumasa
    Yamada, Takeshi
    Tsuge, Satoru
    Miyajima, Chiyomi
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Tamura, Satoshi
    Ogawa, Tetsuji
    Matsuda, Shigeki
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 968 - +
  • [2] Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments - Newest Part of the CENSREC Series -
    Nishiura, Takanobu
    Nakayama, Masato
    Denda, Yuki
    Kitaoka, Norihide
    Yamamoto, Kazumasa
    Yamada, Takeshi
    Tsuge, Satoru
    Miyajima, Chiyomi
    Fujimoto, Masakiyo
    Takiguchi, Tetsuya
    Tamura, Satoshi
    Kuroiwa, Shingo
    Takeda, Kazuya
    Nakamura, Satoshi
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1828 - 1834
  • [3] Robust distant-talking speech recognition
    Lin, Q
    Che, C
    Yuk, DS
    Jin, L
    deVries, B
    Pearson, J
    Flanagan, J
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 21 - 24
  • [4] Improved HMM separation for distant-talking speech recognition
    Takiguchi, T
    Nishimura, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1127 - 1137
  • [5] Speech intelligibility under in-car distant-talking environments
    Mizumachi, Mitsunori
    Takuma, Shota
    Ohsugi, Ikuyo
    Hamada, Yasushi
    Nishi, Koichi
    Proceedings of the INTER-NOISE 2016 - 45th International Congress and Exposition on Noise Control Engineering: Towards a Quieter Future, 2016, : 389 - 393
  • [6] DICIT: Evaluation of a Distant-talking Speech Interface for Television
    Sowa, Timo
    Arisio, Fiorenza
    Cristoforetti, Luca
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2161 - 2168
  • [7] Strategies for distant speech recognition in reverberant environments
    Delcroix, Marc
    Yoshioka, Takuya
    Ogawa, Atsunori
    Kubo, Yotaro
    Fujimoto, Masakiyo
    Ito, Nobutaka
    Kinoshita, Keisuke
    Espi, Miquel
    Araki, Shoko
    Hori, Takaaki
    Nakatani, Tomohiro
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [8] ROBUSTNESS TO SPEAKER POSITION IN DISTANT-TALKING AUTOMATIC SPEECH RECOGNITION
    Gomez, Randy
    Nakamura, Keisuke
    Nakadai, Kazuhiro
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7034 - 7038
  • [9] Composite decision by Bayesian inference in distant-talking speech recognition
    Ji, Mikyong
    Kim, Sungtak
    Kim, Hoirin
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 463 - 470
  • [10] Environment-dependent denoising autoencoder for distant-talking speech recognition
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Ren, Bo
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,