Audiovisual Speaker Identification Based on Lip and Speech Modalities

被引:0
|
作者
Chelali, Fatma [1 ]
Djeradi, Amar [1 ]
机构
[1] Univ Sci & Technol Houari Boumedienne, Fac Elect Engn & Comp Sci, Algiers, Algeria
关键词
Audiovisual speaker recognition; DCT; DWT; PLP; MFCC; RECOGNITION; INFORMATION; FUSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore, some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.
引用
收藏
页码:99 / 110
页数:12
相关论文
共 50 条
  • [31] TONGUE-LIP COORDINATION IN SPEECH PRODUCTION IN A DEAF SPEAKER
    MCGARR, NS
    HARRIS, KS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 : S70 - S70
  • [32] Nonnative Audiovisual Speech Perception in Noise: Dissociable Effects of the Speaker and Listener
    Xie, Zilong
    Yi, Han-Gyol
    Chandrasekaran, Bharath
    PLOS ONE, 2014, 9 (12):
  • [33] Speaker identification utilizing noncontemporary speech
    Hollien, H
    Schwartz, R
    JOURNAL OF FORENSIC SCIENCES, 2001, 46 (01) : 63 - 67
  • [34] SPEAKER IDENTIFICATION WITH DISTANT MICROPHONE SPEECH
    Jin, Qin
    Li, Runxin
    Yang, Qian
    Laskowski, Kornel
    Schultz, Tanja
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4518 - 4521
  • [35] Speaker Identification using Whispered Speech
    Jawarkar, Naresh P.
    Holambe, Raghunath S.
    Basu, Tapan Kumar
    2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 778 - 781
  • [36] Speaker Re-identification with Speaker Dependent Speech Enhancement
    Shi, Yanpei
    Huang, Qiang
    Hain, Thomas
    INTERSPEECH 2020, 2020, : 1530 - 1534
  • [37] Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech
    Sarma, Biswajit Dev
    Das, Rohan Kumar
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 610 - 615
  • [38] Robust lip-motion features for speaker identification
    Çetingül, HE
    Yemez, Y
    Erzin, E
    Tekalp, AM
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 509 - 512
  • [39] Real time lip contour extraction for speaker identification
    Computer Vision and Pattern Recognition Lab., Beijing Institute of Technology, Beijing 100081, China
    Jisuanji Gongcheng, 2006, 5 (202-204):
  • [40] Adaptive speaker identification with audiovisual cues for movie content analysis
    Li, Y
    Narayanan, SS
    Kuo, CCJ
    PATTERN RECOGNITION LETTERS, 2004, 25 (07) : 777 - 791