Audiovisual Speaker Identification Based on Lip and Speech Modalities

被引：0

作者：

Chelali, Fatma ^{[1
]}

Djeradi, Amar ^{[1
]}

机构：

[1] Univ Sci & Technol Houari Boumedienne, Fac Elect Engn & Comp Sci, Algiers, Algeria

来源：

INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY | 2017年 / 14卷 / 01期

关键词：

Audiovisual speaker recognition; DCT; DWT; PLP; MFCC; RECOGNITION; INFORMATION; FUSION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore, some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.

引用

页码：99 / 110

页数：12

共 50 条

[41] Robust speech features based on wavelet transform with application to speaker identification
Hsieh, CT
Lai, E
Wang, YC
IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2002, 149 (02): : 108 - 114
[42] Speaker Identification for Whispered Speech based on Frequency Warping and Score Competition
Fan, Xing
Hansen, John H. L.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1313 - 1316
[43] Enhanced VQ-based algorithms for speech independent speaker identification
Fan, NP
Rosca, J
AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 470 - 477
[44] Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
Kanda, Naoyuki
Gaur, Yashesh
Wang, Xiaofei
Meng, Zhong
Chen, Zhuo
Zhou, Tianyan
Yoshioka, Takuya
INTERSPEECH 2020, 2020, : 36 - 40
[45] Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment
Chelali F.Z.
International Journal of Information Technology, 2023, 15 (6) : 3135 - 3145
[46] From 3-D Speaker Cloning to Text-to-Audiovisual-Speech
Fagel, Sascha
Elisei, Frederic
Bailly, Gerard
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2325 - 2325
[47] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
M. Milošević
Ž. Nedeljković
U. Glavitsch
Ž. Đurović
Journal of Communications Technology and Electronics, 2019, 64 : 1256 - 1265
[48] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
Milosevic, M.
Nedeljkovic, Z.
Glavitsch, U.
Durovic, Z.
JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2019, 64 (11) : 1256 - 1265
[49] Reducing Speech Coding Distortion for Speaker Identification
McCree, Alan
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 941 - 944
[50] Familiar and Unfamiliar Speaker Identification in Speech and Singing
Taylor, Katelyn
Gully, Amelia
Daffern, Helena
INTERSPEECH 2024, 2024, : 472 - 476

← 1 2 3 4 5 →