Robot Command Interface Using an Audio-Visual Speech Recognition System

被引:0
|
作者
Ceballos, Alexander [1 ,2 ]
Gomez, Juan [2 ]
Prieto, Flavio [3 ]
Redarce, Tanneguy [4 ]
机构
[1] Inst Tecnol Metropolitano, Medellin, Colombia
[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia
[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia
[4] Inst Natl Sci Appliquees Lyo, Lyon, France
关键词
Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
引用
收藏
页码:869 / +
页数:3
相关论文
共 50 条
  • [21] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
    Su, Rongfeng
    Wang, Lan
    Liu, Xunying
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
  • [22] Lip movement synthesis in audio-visual speech recognition system
    Li, Junquan
    Yin, Yixin
    Proc. 2005 IEEE Int. Conf. on Lang. Process. Knowl. Engin. IEEE NLP-KE '05, (461-465):
  • [23] Audio-Visual Robot Command Recognition D-META'12 Grand Challenge
    Sanchez-Riera, Jordi
    Alameda-Pineda, Xavier
    Horaud, Radu
    ICMI '12: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2012, : 371 - 378
  • [24] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [25] Enhancing Quality and Accuracy of Speech Recognition System by Using Multimodal Audio-Visual Speech signal
    El Maghraby, Eslam E.
    Gody, Amr M.
    Farouk, M. Hesham
    ICENCO 2016 - 2016 12TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO) - BOUNDLESS SMART SOCIETIES, 2016, : 219 - 229
  • [26] Large vocabulary audio-visual speech recognition using the Janus speech recognition toolkit
    Kratt, J
    Metze, F
    Stiefelhagen, R
    Waibel, A
    PATTERN RECOGNITION, 2004, 3175 : 488 - 495
  • [27] Audio-Visual Speech Recognition for Human-Robot Interaction: a Feasibility Study
    Goetzee, Sander
    Mihhailov, Konstantin
    van de laar, Roel
    Baraka, Kim
    Hindriks, Koen V.
    2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, 2024, : 930 - 935
  • [28] Speaker independent audio-visual speech recognition
    Zhang, Y
    Levinson, S
    Huang, T
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076
  • [29] A coupled HMM for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Xiaoxiang, L
    Mao, C
    Murphy, K
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
  • [30] An asynchronous DBN for audio-visual speech recognition
    Saenko, Kate
    Livescu, Karen
    2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +