VALID: A new practical audio-visual database, and comparative results

被引:0
|
作者
Fox, NA [1 ]
O'Mullane, BA [1 ]
Reilly, RB [1 ]
机构
[1] Univ Coll Dublin, Dept Elect & Elect Engn, Dublin 4, Ireland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of deployed audio, face, and multi-modal person recognition systems in non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the new large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy "real world" office scenario with no control on illumination or acoustic noise. In this paper we describe the acquisition and content of the VALID database, consisting of five recording sessions of 106 subjects over a period of one month. Speaker identification experiments using visual speech features extracted from the mouth region are reported. The performance based on the uncontrolled VALID database is compared with that of the controlled XM2VTS database. The best VALID and XM2VTS based accuracies are 63.21% and 97.17% respectively. This highlights the degrading effect of an uncontrolled illumination environment and the importance of this database for deploying real world applications. The VALID database is available to the academic community through http://ee.ucdie/validdb/.
引用
收藏
页码:777 / 786
页数:10
相关论文
共 50 条
  • [41] Audio-visual event detection based on mining of semantic audio-visual labels
    Goh, KS
    Miyahara, K
    Radhakrishan, R
    Xiong, ZY
    Divakaran, A
    STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
  • [42] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [43] An Audio-Visual Database for Post-war Architecture and the City in Greece
    Alifragkis, Stavros
    Papakonstantinou, George
    AMBIENT MEDIA AND SYSTEMS, 2013, 118 : 1 - 15
  • [44] Audio-Visual Causality and Stimulus Reliability Affect Audio-Visual Synchrony Perception
    Li, Shao
    Ding, Qi
    Yuan, Yichen
    Yue, Zhenzhu
    FRONTIERS IN PSYCHOLOGY, 2021, 12
  • [45] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
    Choi, Jeongsoo
    Park, Se Jin
    Kim, Minsu
    Ro, Yong Man
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27315 - 27327
  • [46] Audio-Visual Database for Spanish-Based Speech Recognition Systems
    Cordova-Esparza, Diana-Margarita
    Terven, Juan
    Romero, Alejandro
    Marcela Herrera-Navarro, Ana
    ADVANCES IN SOFT COMPUTING, MICAI 2019, 2019, 11835 : 452 - 460
  • [47] BAUM-2: a multilingual audio-visual affective face database
    Erdem, Cigdem Eroglu
    Turan, Cigdem
    Aydin, Zafer
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (18) : 7429 - 7459
  • [48] GuiTones-I: An Audio-Visual Database of Monophonic Guitar Tones
    Aggarwal, Arpit
    Kumar, Rajeev
    Sahay, Tanvi
    Cilandra, Mahesh
    PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 497 - 500
  • [49] UnB-AV: An Audio-Visual Database for Multimedia Quality Research
    Martinez, Helard B.
    Hines, Andrew
    Farias, Mylene C. Q.
    IEEE ACCESS, 2020, 8 : 56641 - 56649
  • [50] AUDIO-VISUAL QUALITY ASSESSMENT FOR USER GENERATED CONTENT: DATABASE AND METHOD
    Cao, Yuqin
    Min, Xiongkuo
    Sun, Wei
    Zhang, Xiaoping
    Zhai, Guangtao
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1495 - 1499