VALID: A new practical audio-visual database, and comparative results

被引：0

作者：

Fox, NA ^{[1
]}

O'Mullane, BA ^{[1
]}

Reilly, RB ^{[1
]}

机构：

[1] Univ Coll Dublin, Dept Elect & Elect Engn, Dublin 4, Ireland

来源：

AUDIO AND VIDEO BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS | 2005年 / 3546卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The performance of deployed audio, face, and multi-modal person recognition systems in non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the new large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy "real world" office scenario with no control on illumination or acoustic noise. In this paper we describe the acquisition and content of the VALID database, consisting of five recording sessions of 106 subjects over a period of one month. Speaker identification experiments using visual speech features extracted from the mouth region are reported. The performance based on the uncontrolled VALID database is compared with that of the controlled XM2VTS database. The best VALID and XM2VTS based accuracies are 63.21% and 97.17% respectively. This highlights the degrading effect of an uncontrolled illumination environment and the importance of this database for deploying real world applications. The VALID database is available to the academic community through http://ee.ucdie/validdb/.

引用

页码：777 / 786

页数：10

共 50 条

[41] Audio-visual event detection based on mining of semantic audio-visual labels
Goh, KS
Miyahara, K
Radhakrishan, R
Xiong, ZY
Divakaran, A
STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
[42] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
Zhang, Zi-Qiang
Zhang, Jie
Zhang, Jian-Shu
Wu, Ming-Hui
Fang, Xin
Dai, Li-Rong
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
[43] An Audio-Visual Database for Post-war Architecture and the City in Greece
Alifragkis, Stavros
Papakonstantinou, George
AMBIENT MEDIA AND SYSTEMS, 2013, 118 : 1 - 15
[44] Audio-Visual Causality and Stimulus Reliability Affect Audio-Visual Synchrony Perception
Li, Shao
Ding, Qi
Yuan, Yichen
Yue, Zhenzhu
FRONTIERS IN PSYCHOLOGY, 2021, 12
[45] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Choi, Jeongsoo
Park, Se Jin
Kim, Minsu
Ro, Yong Man
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27315 - 27327
[46] Audio-Visual Database for Spanish-Based Speech Recognition Systems
Cordova-Esparza, Diana-Margarita
Terven, Juan
Romero, Alejandro
Marcela Herrera-Navarro, Ana
ADVANCES IN SOFT COMPUTING, MICAI 2019, 2019, 11835 : 452 - 460
[47] BAUM-2: a multilingual audio-visual affective face database
Erdem, Cigdem Eroglu
Turan, Cigdem
Aydin, Zafer
MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (18) : 7429 - 7459
[48] GuiTones-I: An Audio-Visual Database of Monophonic Guitar Tones
Aggarwal, Arpit
Kumar, Rajeev
Sahay, Tanvi
Cilandra, Mahesh
PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 497 - 500
[49] UnB-AV: An Audio-Visual Database for Multimedia Quality Research
Martinez, Helard B.
Hines, Andrew
Farias, Mylene C. Q.
IEEE ACCESS, 2020, 8 : 56641 - 56649
[50] AUDIO-VISUAL QUALITY ASSESSMENT FOR USER GENERATED CONTENT: DATABASE AND METHOD
Cao, Yuqin
Min, Xiongkuo
Sun, Wei
Zhang, Xiaoping
Zhai, Guangtao
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1495 - 1499

← 1 2 3 4 5 →