Audiovisual Speaker Identification Based on Lip and Speech Modalities

被引：0

作者：

Chelali, Fatma ^{[1
]}

Djeradi, Amar ^{[1
]}

机构：

[1] Univ Sci & Technol Houari Boumedienne, Fac Elect Engn & Comp Sci, Algiers, Algeria

来源：

INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY | 2017年 / 14卷 / 01期

关键词：

Audiovisual speaker recognition; DCT; DWT; PLP; MFCC; RECOGNITION; INFORMATION; FUSION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore, some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.

引用

页码：99 / 110

页数：12

共 50 条

[31] TONGUE-LIP COORDINATION IN SPEECH PRODUCTION IN A DEAF SPEAKER
MCGARR, NS
HARRIS, KS
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 : S70 - S70
[32] Nonnative Audiovisual Speech Perception in Noise: Dissociable Effects of the Speaker and Listener
Xie, Zilong
Yi, Han-Gyol
Chandrasekaran, Bharath
PLOS ONE, 2014, 9 (12):
[33] Speaker identification utilizing noncontemporary speech
Hollien, H
Schwartz, R
JOURNAL OF FORENSIC SCIENCES, 2001, 46 (01) : 63 - 67
[34] SPEAKER IDENTIFICATION WITH DISTANT MICROPHONE SPEECH
Jin, Qin
Li, Runxin
Yang, Qian
Laskowski, Kornel
Schultz, Tanja
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4518 - 4521
[35] Speaker Identification using Whispered Speech
Jawarkar, Naresh P.
Holambe, Raghunath S.
Basu, Tapan Kumar
2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 778 - 781
[36] Speaker Re-identification with Speaker Dependent Speech Enhancement
Shi, Yanpei
Huang, Qiang
Hain, Thomas
INTERSPEECH 2020, 2020, : 1530 - 1534
[37] Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech
Sarma, Biswajit Dev
Das, Rohan Kumar
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 610 - 615
[38] Robust lip-motion features for speaker identification
Çetingül, HE
Yemez, Y
Erzin, E
Tekalp, AM
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 509 - 512
[39] Real time lip contour extraction for speaker identification
Computer Vision and Pattern Recognition Lab., Beijing Institute of Technology, Beijing 100081, China
Jisuanji Gongcheng, 2006, 5 (202-204):
[40] Adaptive speaker identification with audiovisual cues for movie content analysis
Li, Y
Narayanan, SS
Kuo, CCJ
PATTERN RECOGNITION LETTERS, 2004, 25 (07) : 777 - 791

← 1 2 3 4 5 →