VALID: A new practical audio-visual database, and comparative results

被引:0
|
作者
Fox, NA [1 ]
O'Mullane, BA [1 ]
Reilly, RB [1 ]
机构
[1] Univ Coll Dublin, Dept Elect & Elect Engn, Dublin 4, Ireland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of deployed audio, face, and multi-modal person recognition systems in non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the new large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy "real world" office scenario with no control on illumination or acoustic noise. In this paper we describe the acquisition and content of the VALID database, consisting of five recording sessions of 106 subjects over a period of one month. Speaker identification experiments using visual speech features extracted from the mouth region are reported. The performance based on the uncontrolled VALID database is compared with that of the controlled XM2VTS database. The best VALID and XM2VTS based accuracies are 63.21% and 97.17% respectively. This highlights the degrading effect of an uncontrolled illumination environment and the importance of this database for deploying real world applications. The VALID database is available to the academic community through http://ee.ucdie/validdb/.
引用
收藏
页码:777 / 786
页数:10
相关论文
共 50 条
  • [1] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [2] Audio-Visual Twins Database
    Li, Jing
    Zhang, Li
    Guo, Dong
    Zhuo, Shaojie
    Sim, Terence
    2015 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2015, : 493 - 500
  • [3] A New Audio-Visual Database to Represent Urban Path
    Qing, Ji
    INFORMATION AND BUSINESS INTELLIGENCE, PT I, 2012, 267 : 713 - 719
  • [4] SUTAV: A Turkish Audio-Visual Database
    Topkaya, Ibrahim Saygin
    Erdogan, Hakan
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2334 - 2337
  • [5] A Turkish Audio-Visual Emotional Database
    Onder, Onur
    Zhalehpour, Sara
    Erdem, Cigdem Eroglu
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [6] NEW AUDIO-VISUAL STUDENT
    MONODCASSIDY, H
    MODERN LANGUAGE JOURNAL, 1966, 50 (01): : 15 - 18
  • [7] NEW AUDIO-VISUAL SYSTEM
    不详
    EDUCATIONAL TECHNOLOGY, 1967, 7 (16) : 11 - 13
  • [8] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [9] METHODS AND CHALLENGES FOR CREATING AN EMOTIONAL AUDIO-VISUAL DATABASE
    Pandharipande, Meghna A.
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), 2017, : 183 - 188
  • [10] Development of an audio-visual database system for human identification
    Bargale, CB
    Chaudhuri, S
    Bhattacharyya, P
    AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, 1997, 1206 : 345 - 352