A hybrid approach for automatic lip localization and viseme classification to enhance visual speech recognition

被引:0
|
作者
Multimedia Information Systems and Advanced Computing Laboratory, High Institute of Computer Science and Multimedia, University of Sfax, Sfax, Tunisia [1 ]
机构
来源
Integr. Comput. Aided Eng. | 2008年 / 3卷 / 253-266期
关键词
Extraction - Audition - Speech recognition;
D O I
10.3233/ica-2008-15305
中图分类号
学科分类号
摘要
An automatic lip-reading system is among assistive technologies for hearing impaired or elderly people. We can imagine, for example, a dependent person ordering a machine with an easy lip movement or by a simple visemes (visual phoneme) pronunciation. A lip-reading system is decomposed into three subsystems: a lip localization subsystem, then a feature extracting subsystem, followed by a classification system that maps feature vectors to visemes. The major difficulty in a lip-reading system is the extraction of the visual speech descriptors. In fact, to ensure this task it is necessary to carry out an automatic localization and tracking of the labial gestures. We present, in this paper, a new automatic approach for lip POI localization and feature extraction on a speaker's face based on mouth color information and a geometrical model of the lips. The extracted visual information is then classified in order to recognize the uttered viseme. We have developed our Automatic Lip Feature Extraction prototype (ALiFE). ALiFE prototype is evaluated for multiple speakers under natural conditions. Experiments include a group of French visemes for different speakers. Results revealed that our system recognizes 94.64% of the tested French visemes. © 2008 - IOS Press and the author(s). All rights reserved.
引用
收藏
相关论文
共 50 条
  • [31] Multistage classification scheme to enhance speech emotion recognition
    S. S. Poorna
    G. J. Nair
    International Journal of Speech Technology, 2019, 22 : 327 - 340
  • [32] Multistage classification scheme to enhance speech emotion recognition
    Poorna, S. S.
    Nair, G. J.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (02) : 327 - 340
  • [33] Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition
    Saswati Debnath
    Pinki Roy
    Signal, Image and Video Processing, 2021, 15 : 25 - 32
  • [34] Viseme-Dependent Weight Optimization for CHMM-Based Audio-Visual Speech Recognition
    Karpov, Alexey
    Ronzhin, Andrey
    Markov, Konstantin
    Zelezny, Milos
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2686 - +
  • [35] An automatic approach for classification and categorisation of lip morphological traits
    Abbas, Hawraa H.
    Hicks, Yulia
    Zhurov, Alexei
    Marshall, David
    Claes, Peter
    Wilson-Nagrani, Caryl
    Richmond, Stephen
    PLOS ONE, 2019, 14 (10):
  • [36] Automatic speech recognition using audio visual cues
    Yashwanth, H
    Mahendrakar, H
    David, S
    PROCEEDINGS OF THE IEEE INDICON 2004, 2004, : 166 - 169
  • [37] Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model
    Khwaja, Mohammed Kamal
    Vikash, Peddakota
    Arulmozhivarman, P.
    Lui, Simon
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 895 - 905
  • [38] Sequential Classification Criteria for NNs in Automatic Speech Recognition
    Wang, Guangsen
    Sim, Khe Chai
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 448 - 451
  • [39] Lip movement synthesis in audio-visual speech recognition system
    Li, Junquan
    Yin, Yixin
    Proc. 2005 IEEE Int. Conf. on Lang. Process. Knowl. Engin. IEEE NLP-KE '05, (461-465):
  • [40] Lip movement synthesis in audio-visual speech recognition system
    Li, JQ
    Yin, YX
    PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 461 - 465