Audiovisual Speaker Identification Based on Lip and Speech Modalities

被引：0

作者：

Chelali, Fatma ^{[1
]}

Djeradi, Amar ^{[1
]}

机构：

[1] Univ Sci & Technol Houari Boumedienne, Fac Elect Engn & Comp Sci, Algiers, Algeria

来源：

INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY | 2017年 / 14卷 / 01期

关键词：

Audiovisual speaker recognition; DCT; DWT; PLP; MFCC; RECOGNITION; INFORMATION; FUSION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore, some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.

引用

页码：99 / 110

页数：12

共 50 条

[21] Speaker Identification in Overlapping Speech
Tsai, Wei-Ho
Liao, Shih-Jie
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2010, 26 (05) : 1891 - 1903
[22] Speech Enhancement for Speaker Identification
Mahesh, R.
2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
[23] Multimodal speaker/speech recognition using lip motion, lip texture and audio
Cetingul, H. E.
Erzin, E.
Yemez, Y.
Tekalp, A. M.
SIGNAL PROCESSING, 2006, 86 (12) : 3549 - 3558
[24] Sparse Coding Based Lip Texture Representation For Visual Speaker Identification
Lai, Jun-Yao
Wang, Shi-Lin
Shi, Xing-Jian
Liew, Alan Wee-Chung
2014 19TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2014, : 607 - 610
[25] The use of lip motion for biometric speaker identification
Çetingül, HE
Yemez, Y
Erzin, E
Tekalp, AM
PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 148 - 151
[26] SPEAKER IDENTIFICATION AND MESSAGE IDENTIFICATION IN SPEECH RECOGNITION
GARVIN, PL
LADEFOGED, P
PHONETICA, 1963, 9 (04) : 193 - 199
[27] Robust feature based on speech harmonic structure for speaker identification
College of Communication and Information Engineering, Nanjing Univ. of Posts and Telecom., Nanjing 210003, China
Dianzi Yu Xinxi Xuebao, 2006, 10 (1786-1789):
[28] Spectral Restoration Based Speech Enhancement for Robust Speaker Identification
Saleem, Nasir
Tareen, Tayyaba Gul
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 5 (01): : 34 - 39
[29] Speaker independent speech recognition system based on phoneme identification
Maheswari, N. Uma
Kabilan, A. P.
Venkatesh, R.
ICCN: 2008 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING, 2008, : 585 - +
[30] Learning Speaker-specific Lip-to-Speech Generation
Varshney, Munender
Yadav, Ravindra
Namboodiri, Vinay P.
Hegde, Rajesh M.
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 491 - 498

← 1 2 3 4 5 →