A hybrid approach for automatic lip localization and viseme classification to enhance visual speech recognition

被引:0
|
作者
Multimedia Information Systems and Advanced Computing Laboratory, High Institute of Computer Science and Multimedia, University of Sfax, Sfax, Tunisia [1 ]
机构
来源
Integr. Comput. Aided Eng. | 2008年 / 3卷 / 253-266期
关键词
Extraction - Audition - Speech recognition;
D O I
10.3233/ica-2008-15305
中图分类号
学科分类号
摘要
An automatic lip-reading system is among assistive technologies for hearing impaired or elderly people. We can imagine, for example, a dependent person ordering a machine with an easy lip movement or by a simple visemes (visual phoneme) pronunciation. A lip-reading system is decomposed into three subsystems: a lip localization subsystem, then a feature extracting subsystem, followed by a classification system that maps feature vectors to visemes. The major difficulty in a lip-reading system is the extraction of the visual speech descriptors. In fact, to ensure this task it is necessary to carry out an automatic localization and tracking of the labial gestures. We present, in this paper, a new automatic approach for lip POI localization and feature extraction on a speaker's face based on mouth color information and a geometrical model of the lips. The extracted visual information is then classified in order to recognize the uttered viseme. We have developed our Automatic Lip Feature Extraction prototype (ALiFE). ALiFE prototype is evaluated for multiple speakers under natural conditions. Experiments include a group of French visemes for different speakers. Results revealed that our system recognizes 94.64% of the tested French visemes. © 2008 - IOS Press and the author(s). All rights reserved.
引用
收藏
相关论文
共 50 条
  • [41] Analysis of lip geometric features for audio-visual speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Han, Z
    Chung, KC
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2004, 34 (04): : 564 - 570
  • [42] Vietnamese automatic speech recognition: The FLaVoR approach
    Vu, Quan
    Demuynck, Kris
    Van Compernolle, Dirk
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 464 - +
  • [43] Hybrid -task learning for robust automatic speech recognition
    Pironkov, Gueorgui
    Wood, Sean U. N.
    Dupont, Stephane
    COMPUTER SPEECH AND LANGUAGE, 2020, 64
  • [44] Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition
    Sivasankaran, Sunit
    Vincent, Emmanuel
    Fohr, Dominique
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 346 - 350
  • [45] Appearance and shape-based hybrid visual feature extraction: toward audio-visual automatic speech recognition
    Debnath, Saswati
    Roy, Pinki
    SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (01) : 25 - 32
  • [46] Visual-speech-pass filtering for robust automatic lip-reading
    Jong-Seok Lee
    Pattern Analysis and Applications, 2014, 17 : 611 - 621
  • [48] An audio-visual corpus for speech perception and automatic speech recognition (L)
    Cooke, Martin
    Barker, Jon
    Cunningham, Stuart
    Shao, Xu
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05): : 2421 - 2424
  • [49] Indonesian Audio-Visual Speech Corpus for Multimodal Automatic Speech Recognition
    Maulana, Muhammad Rizki Aulia Rahman
    Fanany, Mohamad Ivan
    2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 381 - 385
  • [50] Speech Command Classification System for Sinhala Language based on Automatic Speech Recognition
    Dinushika, Thilini
    Kavmini, Lakshika
    Abeyawardhana, Pamoda
    Thayasivam, Uthayasanker
    Jayasena, Sanath
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 205 - 210