Automatic Speechreading with Applications to Human-Computer Interfaces

被引:0
|
作者
Xiaozheng Zhang
Charles C. Broun
Russell M. Mersereau
Mark A. Clements
机构
[1] Georgia Institute of Technology,Center for Signal and Image Processing
[2] Motorola Human Interface Lab,undefined
关键词
automatic speechreading; visual feature extraction; Markov random fields; hidden Markov models; polynomial classifier; speech recognition; speaker verification;
D O I
暂无
中图分类号
学科分类号
摘要
There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved system performance over acoustic-only methods, especially in noisy environments. In this paper, we investigate the usefulness of visual speech information in HCI related applications. We first introduce a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields. We then derive a relevant set of visual speech parameters and incorporate them into a recognition engine. We present various visual feature performance comparisons to explore their impact on the recognition accuracy, including the lip inner contour and the visibility of the tongue and teeth. By using a common visual feature set, we demonstrate two applications that exploit speechreading in a joint audio-visual speech signal processing task: speech recognition and speaker verification. The experimental results based on two databases demonstrate that the visual information is highly effective for improving recognition performance over a variety of acoustic noise levels.
引用
收藏
相关论文
共 50 条
  • [31] VISUAL-SEARCH IN MODERN HUMAN-COMPUTER INTERFACES
    SCOTT, D
    BEHAVIOUR & INFORMATION TECHNOLOGY, 1993, 12 (03) : 174 - 189
  • [32] Multiculturalism and Human-Computer Interaction: User Interfaces for Immigrants
    Bobeth, Jan
    Deutsch, Stephanie
    Tscheligi, Manfred
    MENSCH & COMPUTER 2013 - WORKSHOPBAND: INTERAKTIVE VIELFALT. INTERACTIVE DIVERSITY, 2013, : 495 - 498
  • [33] Audiovisual Analysis and Synthesis for Multimodal Human-Computer Interfaces
    Sevillano, Xavier
    Melenchon, Javier
    Cobo, German
    Claudi Socoro, Joan
    Alias, Francesc
    ENGINEERING THE USER INTERFACE: FROM RESEARCH TO PRACTICE, 2009, : 179 - 194
  • [34] Cognitive human-computer communication by means of haptic interfaces
    Avizzano, Carlo Alberto
    2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 75 - 80
  • [35] A new approach to perceptual assessment of human-computer interfaces
    Rizzi, Alessandro
    Fogli, Daniela
    Barricelli, Barbara Rita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (05) : 7381 - 7399
  • [36] Challenges in speech-based human-computer interfaces
    Minker, Wolfgang
    Pittermann, Johannes
    Pittermann, Angela
    Strauss, Petra-Maria
    Buehler, Dirk
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2007, 10 (2-3) : 109 - 119
  • [37] Eye tracking for evaluating industrial human-computer interfaces
    Zülch, G
    Stowasser, S
    MIND'S EYE: COGNITIVE AND APPLIED ASPECTS OF EYE MOVEMENT RESEARCH, 2003, : 531 - 553
  • [38] Natural Human-Computer Interfaces' Paradigm and Cognitive Ergonomics
    Almeida, Victor M.
    Rafael, Sonia
    Neves, Marco
    ADVANCES IN ERGONOMICS IN DESIGN, 2020, 955 : 220 - 227
  • [39] Implementation Goals for Multimodal Interfaces in Human-Computer Interaction
    Rafael, Sonia
    Almeida, Victor M.
    HUMAN-COMPUTER INTERACTION: THEORY, METHODS AND TOOLS, HCII 2021, PT I, 2021, 12762 : 230 - 239
  • [40] Intelligent support mechanisms in adaptable human-computer interfaces
    Spath, D.
    Weule, H.
    CIRP Annals - Manufacturing Technology, 1993, 42 (01) : 519 - 522