Audio-visual biometric based speaker identification

被引:1
|
作者
Kar, Biswajit [1 ]
Bhatia, Sandeep [1 ]
Dutta, P. K. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Kharagpur 721302, W Bengal, India
关键词
biometrics; speaker recognition; speaker model; audio visual speech recognition;
D O I
10.1109/ICCIMA.2007.21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a multimodal audio-visual speaker identification system. The proposed system decomposes the information existing in a video stream into two components: speech and lip motion. It has been studied that lip information not only presents speech information but also characteristic information about a person's identity. Fusing this information with speech information will produce robust person identification tinder adverse condition. Gaussian mixture models (GMMs) and Hidden markov models (HMMs) are used throughout this work for the tasks of text dependent speaker recognition and month tracking. The performance is evaluated for dataset of 22 Indian of different ethnicity speakers each tittering a sentence. The results show that the performance of the biometric system is significantly better when both audio and video features are used.
引用
收藏
页码:94 / 98
页数:5
相关论文
共 50 条
  • [1] Audio-visual speaker identification based on the use of dynamic audio and visual features
    Fox, N
    Reilly, RB
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
  • [2] A Bayesian approach to audio-visual speaker identification
    Nefian, AV
    Liang, LH
    Fu, TY
    Liu, XX
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 761 - 769
  • [3] ENVIRONMENTALLY ROBUST AUDIO-VISUAL SPEAKER IDENTIFICATION
    Schoenherr, Lea
    Orth, Dennis
    Heckmann, Martin
    Kolossa, Dorothea
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 312 - 318
  • [4] Audio-Visual Feature Fusion for Speaker Identification
    Almaadeed, Noor
    Aggoun, Amar
    Amira, Abbes
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
  • [5] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
    Tariquzzaman, Md.
    Kim, Jin Young
    Na, Seung You
    Kim, Hyoung-Gook
    Har, Dongsoo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
  • [6] Audio-visual speaker identification with asynchronous articulatory feature
    Chen, Yanxiang
    Liu, M.
    ELECTRONICS LETTERS, 2010, 46 (03) : 242 - U77
  • [7] Fuzzy audio-visual feature maps for speaker identification
    Chibelushi, CC
    APPLICATIONS AND SCIENCE IN SOFT COMPUTING, 2004, : 317 - 322
  • [8] A confidence-based late fusion framework for audio-visual biometric identification
    Alam, Mohammad Rafiqul
    Bennamoun, Mohammed
    Togneri, Roberto
    Sohel, Ferdous
    PATTERN RECOGNITION LETTERS, 2015, 52 : 65 - 71
  • [9] Audio-Visual Synchronisation for Speaker Diarisation
    Garau, Giulia
    Dielmann, Alfred
    Bourlard, Herve
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2662 - +
  • [10] Audio-visual speaker identification using coupled hidden markov models
    Fu, T
    Liu, XX
    Liang, LH
    Pi, XB
    Nefian, AV
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 29 - 32