Audiovisual Speaker Identification Based on Lip and Speech Modalities

被引:0
|
作者
Chelali, Fatma [1 ]
Djeradi, Amar [1 ]
机构
[1] Univ Sci & Technol Houari Boumedienne, Fac Elect Engn & Comp Sci, Algiers, Algeria
关键词
Audiovisual speaker recognition; DCT; DWT; PLP; MFCC; RECOGNITION; INFORMATION; FUSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore, some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.
引用
收藏
页码:99 / 110
页数:12
相关论文
共 50 条
  • [41] Robust speech features based on wavelet transform with application to speaker identification
    Hsieh, CT
    Lai, E
    Wang, YC
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2002, 149 (02): : 108 - 114
  • [42] Speaker Identification for Whispered Speech based on Frequency Warping and Score Competition
    Fan, Xing
    Hansen, John H. L.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1313 - 1316
  • [43] Enhanced VQ-based algorithms for speech independent speaker identification
    Fan, NP
    Rosca, J
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 470 - 477
  • [44] Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
    Kanda, Naoyuki
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Zhou, Tianyan
    Yoshioka, Takuya
    INTERSPEECH 2020, 2020, : 36 - 40
  • [45] Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment
    Chelali F.Z.
    International Journal of Information Technology, 2023, 15 (6) : 3135 - 3145
  • [46] From 3-D Speaker Cloning to Text-to-Audiovisual-Speech
    Fagel, Sascha
    Elisei, Frederic
    Bailly, Gerard
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2325 - 2325
  • [47] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    M. Milošević
    Ž. Nedeljković
    U. Glavitsch
    Ž. Đurović
    Journal of Communications Technology and Electronics, 2019, 64 : 1256 - 1265
  • [48] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    Milosevic, M.
    Nedeljkovic, Z.
    Glavitsch, U.
    Durovic, Z.
    JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2019, 64 (11) : 1256 - 1265
  • [49] Reducing Speech Coding Distortion for Speaker Identification
    McCree, Alan
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 941 - 944
  • [50] Familiar and Unfamiliar Speaker Identification in Speech and Singing
    Taylor, Katelyn
    Gully, Amelia
    Daffern, Helena
    INTERSPEECH 2024, 2024, : 472 - 476