Audio-visual biometric recognition by vector quantization

被引：3

作者：

Das, Amitava ^{[1
]}

Ghosh, Prasanta ^{[1
]}

机构：

[1] Microsoft Res India, Bangalore, Karnataka, India

来源：

2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP | 2006年

关键词：

D O I：

10.1109/SLT.2006.326843

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a Vector Quantization based bimodal (speech and face) biometric recognition method which delivers high performance amidst noise, illumination variations and occlusions (disguised mode) while requiring very little training data, memory storage and complexity of operation. A Transform VQ method delivers good face-recognition performance and a Text Dependent VQ method provides good recognition performance using speech. Simple fusion of two leads to a wider separation between the user-clusters in the combined feature space, leading to high performance.

引用

页码：166 / +

页数：2

共 50 条

[21] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
APPLIED ACOUSTICS, 2023, 211
[22] Audio-Visual Learning for Multimodal Emotion Recognition
Fan, Siyu
Jing, Jianan
Wang, Chongwen
SYMMETRY-BASEL, 2025, 17 (03):
[23] Audio-Visual Attention Networks for Emotion Recognition
Lee, Jiyoung
Kim, Sunok
Kim, Seungryong
Sohn, Kwanghoon
AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
[24] A coupled HMM for audio-visual speech recognition
Nefian, AV
Liang, LH
Pi, XB
Xiaoxiang, L
Mao, C
Murphy, K
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
[25] Scene recognition with audio-visual sensor fusion
Devicharan, D
Mehrotra, KG
Mohan, CK
Varshney, PK
Zuo, L
Multisensor, Multisource Information Fusion: Architectures, Algorithms and Applications 2005, 2005, 5813 : 201 - 210
[26] Speaker independent audio-visual speech recognition
Zhang, Y
Levinson, S
Huang, T
2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076
[27] Multifactor fusion for audio-visual speaker recognition
Chetty, Girija
Tran, Dat
LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
[28] An asynchronous DBN for audio-visual speech recognition
Saenko, Kate
Livescu, Karen
2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +
[29] Deep operational audio-visual emotion recognition
Akturk, Kaan
Keceli, Ali Seydi
NEUROCOMPUTING, 2024, 588
[30] Audio-Visual Emotion Recognition in Video Clips
Noroozi, Fatemeh
Marjanovic, Marina
Njegus, Angelina
Escalera, Sergio
Anbarjafari, Gholamreza
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75

← 1 2 3 4 5 →