VALID: A new practical audio-visual database, and comparative results

被引:0
|
作者
Fox, NA [1 ]
O'Mullane, BA [1 ]
Reilly, RB [1 ]
机构
[1] Univ Coll Dublin, Dept Elect & Elect Engn, Dublin 4, Ireland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of deployed audio, face, and multi-modal person recognition systems in non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the new large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy "real world" office scenario with no control on illumination or acoustic noise. In this paper we describe the acquisition and content of the VALID database, consisting of five recording sessions of 106 subjects over a period of one month. Speaker identification experiments using visual speech features extracted from the mouth region are reported. The performance based on the uncontrolled VALID database is compared with that of the controlled XM2VTS database. The best VALID and XM2VTS based accuracies are 63.21% and 97.17% respectively. This highlights the degrading effect of an uncontrolled illumination environment and the importance of this database for deploying real world applications. The VALID database is available to the academic community through http://ee.ucdie/validdb/.
引用
收藏
页码:777 / 786
页数:10
相关论文
共 50 条
  • [21] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
    Sodoyer, D. (sodoyer@icp.inpg.fr), 1600, Hindawi Publishing Corporation (2002):
  • [22] Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
    David Sodoyer
    Jean-Luc Schwartz
    Laurent Girin
    Jacob Klinkisch
    Christian Jutten
    EURASIP Journal on Advances in Signal Processing, 2002
  • [23] The Saving of Saxony's Audio-visual Heritage: A Practical Report
    Eckardt, Andre
    BIBLIOTHEK FORSCHUNG UND PRAXIS, 2020, 44 (03) : 339 - 347
  • [24] Audio-visual perception of new wind parks
    Yu, Tianhong
    Behm, Holger
    Bill, Ralf
    Kang, Jian
    LANDSCAPE AND URBAN PLANNING, 2017, 165 : 1 - 10
  • [25] AUDIO-VISUAL EDUCATION
    Brickman, William W.
    SCHOOL AND SOCIETY, 1948, 67 (1739): : 320 - 326
  • [26] Audio-Visual Objects
    Kubovy M.
    Schutz M.
    Review of Philosophy and Psychology, 2010, 1 (1) : 41 - 61
  • [27] Audio-Visual Segmentation
    Zhou, Jinxing
    Wang, Jianyuan
    Zhang, Jiayi
    Sun, Weixuan
    Zhang, Jing
    Birchfield, Stan
    Guo, Dan
    Kong, Lingpeng
    Wang, Meng
    Zhong, Yiran
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 386 - 403
  • [28] AUDIO-VISUAL CLINICS
    GRABER, TM
    HANNETT, HA
    AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 1963, 49 (07) : 538 - &
  • [29] Audio-visual imposture
    Karam, Walid
    Mokbel, Chafic
    Greige, Hanna
    Chollet, Gerard
    MOBILE MULTIMEDIA/IMAGE PROCESSING FOR MILITARY AND SECURITY APPLICATIONS, 2006, 6250
  • [30] AUDIO-VISUAL TECHNOLOGIES
    TAKESHITA, M
    FURUKAWA, M
    HAYATSU, R
    MURAKAMI, R
    SUZUKI, K
    HASHIZUME, K
    NEC RESEARCH & DEVELOPMENT, 1990, (96): : 265 - 277