CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments

被引:2
|
作者
Fukumori, Takahiro [1 ]
Nishiura, Takanobu [1 ]
Nakayama, Masato [2 ]
Denda, Yuki [3 ]
Kitaoka, Norihide [4 ]
Yamada, Takeshi [7 ]
Yamamoto, Kazumasa [8 ]
Tsuge, Satoru [9 ]
Fujimoto, Masakiyo [10 ]
Takiguchi, Tetsuya [11 ]
Miyajima, Chiyomi [5 ]
Tamura, Satoshi [12 ,13 ]
Ogawa, Tetsuji [14 ]
Matsuda, Shigeki [15 ]
Kuroiwa, Shingo [17 ,18 ]
Takeda, Kazuya [5 ,6 ]
Nakamura, Satoshi [15 ,16 ]
机构
[1] Ritsumeikan Univ, Kusatsu 5258577, Japan
[2] Kinki Univ, Kinokawa 6496493, Japan
[3] Murata Machinery Ltd, Kyoto 6128686, Japan
[4] Nagoya Univ, Grad Sch Informat Sci, Dept Media Sci, Nagoya, Aichi 4648603, Japan
[5] Nagoya Univ, Grad Sch Informat Sci, Nagoya, Aichi 4648603, Japan
[6] Nagoya Univ, Grad Sch, Nagoya, Aichi 4648603, Japan
[7] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki 3058573, Japan
[8] Toyohashi Univ Technol, Dept Informat & Comp Sci, Toyohashi, Aichi 4418580, Japan
[9] Daido Univ, Sch Informat, Dept Informat Syst, Nagoya, Aichi 4578530, Japan
[10] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
[11] Kobe Univ, Kobe, Hyogo 6578501, Japan
[12] Gifu Univ, Dept Comp Sci, Gifu 5011193, Japan
[13] Gifu Univ, Gifu 5011193, Japan
[14] Waseda Univ, Tokyo 1698050, Japan
[15] Natl Inst Informat & Commun Technol, Kyoto 6190288, Japan
[16] Natl Inst Informat & Commun Technol, MASTAR Project, Kyoto 6190288, Japan
[17] Chiba Univ, Grad Sch Adv Integrat Sci, Chiba 2638522, Japan
[18] Natl Inst Informat & Commun Technol, Chiba 2638522, Japan
关键词
Reverberant speech database; Reverberant speech recognition; Various recording environments; Room impulse response; Evaluation framework;
D O I
10.1250/ast.32.201
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets'' and "extra data sets.'' The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple; however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies.
引用
收藏
页码:201 / 210
页数:10
相关论文
共 50 条
  • [31] JOINT SPARSE REPRESENTATION BASED CEPSTRAL-DOMAIN DEREVERBERATION FOR DISTANT-TALKING SPEECH RECOGNITION
    Li, Weifeng
    Wang, Longbiao
    Zhou, Fei
    Liao, Qingmin
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7117 - 7120
  • [32] Distant-talking robust speech recognition using late reflection components of room impulse response
    Gomez, Randy
    Even, Jani
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4581 - 4584
  • [33] Strategies for distant speech recognitionin reverberant environments
    Marc Delcroix
    Takuya Yoshioka
    Atsunori Ogawa
    Yotaro Kubo
    Masakiyo Fujimoto
    Nobutaka Ito
    Keisuke Kinoshita
    Miquel Espi
    Shoko Araki
    Takaaki Hori
    Tomohiro Nakatani
    EURASIP Journal on Advances in Signal Processing, 2015
  • [34] Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array
    Yamada, T
    Nakamura, S
    Shikano, K
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (02): : 48 - 56
  • [35] Distant-talking speech recognition with microphone-array sound pickup and NN/MLLR environment equalization
    Lin, QG
    Flanagan, J
    Che, CW
    PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, : 1099 - 1102
  • [36] Dereverberantion based on Generalized Spectral Subtraction for Distant-talking Speaker Recognition
    Zhang, Zhaofeng
    Wang, Longbiao
    Kai, Atsuhiko
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [37] 3-D N-best search for simultaneous recognition of distant-talking speech of multiple talkers
    Nakamura, S
    Heracleous, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 59 - 63
  • [38] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization
    Ueda, Yuma
    Wang, Longbiao
    Kai, Atsuhiko
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 379 - +
  • [39] CENSREC2: Corpus and Evaluation Environments for In Car Continuous Digit Speech Recognition
    Nakamura, Satoshi
    Fujimoto, Masakiyo
    Takeda, Kazuya
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2330 - +
  • [40] Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction
    Shiota, Satoshi
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    Li, Weifeng
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 384 - +