Audiovisual Speaker Identification Based on Lip and Speech Modalities

被引:0
|
作者
Chelali, Fatma [1 ]
Djeradi, Amar [1 ]
机构
[1] Univ Sci & Technol Houari Boumedienne, Fac Elect Engn & Comp Sci, Algiers, Algeria
关键词
Audiovisual speaker recognition; DCT; DWT; PLP; MFCC; RECOGNITION; INFORMATION; FUSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore, some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.
引用
收藏
页码:99 / 110
页数:12
相关论文
共 50 条
  • [21] Speaker Identification in Overlapping Speech
    Tsai, Wei-Ho
    Liao, Shih-Jie
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2010, 26 (05) : 1891 - 1903
  • [22] Speech Enhancement for Speaker Identification
    Mahesh, R.
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [23] Multimodal speaker/speech recognition using lip motion, lip texture and audio
    Cetingul, H. E.
    Erzin, E.
    Yemez, Y.
    Tekalp, A. M.
    SIGNAL PROCESSING, 2006, 86 (12) : 3549 - 3558
  • [24] Sparse Coding Based Lip Texture Representation For Visual Speaker Identification
    Lai, Jun-Yao
    Wang, Shi-Lin
    Shi, Xing-Jian
    Liew, Alan Wee-Chung
    2014 19TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2014, : 607 - 610
  • [25] The use of lip motion for biometric speaker identification
    Çetingül, HE
    Yemez, Y
    Erzin, E
    Tekalp, AM
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 148 - 151
  • [26] SPEAKER IDENTIFICATION AND MESSAGE IDENTIFICATION IN SPEECH RECOGNITION
    GARVIN, PL
    LADEFOGED, P
    PHONETICA, 1963, 9 (04) : 193 - 199
  • [27] Robust feature based on speech harmonic structure for speaker identification
    College of Communication and Information Engineering, Nanjing Univ. of Posts and Telecom., Nanjing 210003, China
    Dianzi Yu Xinxi Xuebao, 2006, 10 (1786-1789):
  • [28] Spectral Restoration Based Speech Enhancement for Robust Speaker Identification
    Saleem, Nasir
    Tareen, Tayyaba Gul
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 5 (01): : 34 - 39
  • [29] Speaker independent speech recognition system based on phoneme identification
    Maheswari, N. Uma
    Kabilan, A. P.
    Venkatesh, R.
    ICCN: 2008 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING, 2008, : 585 - +
  • [30] Learning Speaker-specific Lip-to-Speech Generation
    Varshney, Munender
    Yadav, Ravindra
    Namboodiri, Vinay P.
    Hegde, Rajesh M.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 491 - 498