A hybrid approach for automatic lip localization and viseme classification to enhance visual speech recognition

被引：0

作者：

Multimedia Information Systems and Advanced Computing Laboratory, High Institute of Computer Science and Multimedia, University of Sfax, Sfax, Tunisia ^{[1
]}

机构：

来源：

Integr. Comput. Aided Eng. | 2008年 / 3卷 / 253-266期

关键词：

Extraction - Audition - Speech recognition;

D O I：

10.3233/ica-2008-15305

中图分类号：

学科分类号：

摘要：

An automatic lip-reading system is among assistive technologies for hearing impaired or elderly people. We can imagine, for example, a dependent person ordering a machine with an easy lip movement or by a simple visemes (visual phoneme) pronunciation. A lip-reading system is decomposed into three subsystems: a lip localization subsystem, then a feature extracting subsystem, followed by a classification system that maps feature vectors to visemes. The major difficulty in a lip-reading system is the extraction of the visual speech descriptors. In fact, to ensure this task it is necessary to carry out an automatic localization and tracking of the labial gestures. We present, in this paper, a new automatic approach for lip POI localization and feature extraction on a speaker's face based on mouth color information and a geometrical model of the lips. The extracted visual information is then classified in order to recognize the uttered viseme. We have developed our Automatic Lip Feature Extraction prototype (ALiFE). ALiFE prototype is evaluated for multiple speakers under natural conditions. Experiments include a group of French visemes for different speakers. Results revealed that our system recognizes 94.64% of the tested French visemes. © 2008 - IOS Press and the author(s). All rights reserved.

引用

共 50 条

[31] Multistage classification scheme to enhance speech emotion recognition
S. S. Poorna
G. J. Nair
International Journal of Speech Technology, 2019, 22 : 327 - 340
[32] Multistage classification scheme to enhance speech emotion recognition
Poorna, S. S.
Nair, G. J.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (02) : 327 - 340
[33] Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition
Saswati Debnath
Pinki Roy
Signal, Image and Video Processing, 2021, 15 : 25 - 32
[34] Viseme-Dependent Weight Optimization for CHMM-Based Audio-Visual Speech Recognition
Karpov, Alexey
Ronzhin, Andrey
Markov, Konstantin
Zelezny, Milos
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2686 - +
[35] An automatic approach for classification and categorisation of lip morphological traits
Abbas, Hawraa H.
Hicks, Yulia
Zhurov, Alexei
Marshall, David
Claes, Peter
Wilson-Nagrani, Caryl
Richmond, Stephen
PLOS ONE, 2019, 14 (10):
[36] Automatic speech recognition using audio visual cues
Yashwanth, H
Mahendrakar, H
David, S
PROCEEDINGS OF THE IEEE INDICON 2004, 2004, : 166 - 169
[37] Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model
Khwaja, Mohammed Kamal
Vikash, Peddakota
Arulmozhivarman, P.
Lui, Simon
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 895 - 905
[38] Sequential Classification Criteria for NNs in Automatic Speech Recognition
Wang, Guangsen
Sim, Khe Chai
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 448 - 451
[39] Lip movement synthesis in audio-visual speech recognition system
Li, Junquan
Yin, Yixin
Proc. 2005 IEEE Int. Conf. on Lang. Process. Knowl. Engin. IEEE NLP-KE '05, (461-465):
[40] Lip movement synthesis in audio-visual speech recognition system
Li, JQ
Yin, YX
PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 461 - 465

← 1 2 3 4 5 →