Robot Command Interface Using an Audio-Visual Speech Recognition System

被引:0
|
作者
Ceballos, Alexander [1 ,2 ]
Gomez, Juan [2 ]
Prieto, Flavio [3 ]
Redarce, Tanneguy [4 ]
机构
[1] Inst Tecnol Metropolitano, Medellin, Colombia
[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia
[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia
[4] Inst Natl Sci Appliquees Lyo, Lyon, France
关键词
Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
引用
收藏
页码:869 / +
页数:3
相关论文
共 50 条
  • [41] Speaker independent audio-visual continuous speech recognition
    Liang, LH
    Liu, XX
    Zhao, YB
    Pi, XB
    Nefian, AV
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A25 - A28
  • [42] Audio-visual fuzzy fusion for robust speech recognition
    Malcangi, M.
    Ouazzane, K.
    Patel, P.
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [43] Building a data corpus for audio-visual speech recognition
    Chitu, Alin G.
    Rothkrantz, Leon J. M.
    EUROMEDIA '2007, 2007, : 88 - 92
  • [44] Audio-Visual Speech Recognition in the Presence of a Competing Speaker
    Shao, Xu
    Barker, Jon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1292 - 1295
  • [45] Audio-Visual Automatic Speech Recognition for Connected Digits
    Wang, Xiaoping
    Hao, Yufeng
    Fu, Degang
    Yuan, Chunwei
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +
  • [46] DARE: Deceiving Audio-Visual speech Recognition model
    Mishra, Saumya
    Gupta, Anup Kumar
    Gupta, Puneet
    KNOWLEDGE-BASED SYSTEMS, 2021, 232
  • [47] Relevant feature selection for audio-visual speech recognition
    Drugman, Thomas
    Gurban, Mihai
    Thiran, Jean-Philippe
    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 179 - +
  • [48] Dynamic Bayesian Networks for Audio-Visual Speech Recognition
    Ara V. Nefian
    Luhong Liang
    Xiaobo Pi
    Xiaoxing Liu
    Kevin Murphy
    EURASIP Journal on Advances in Signal Processing, 2002
  • [49] DEEP MULTIMODAL LEARNING FOR AUDIO-VISUAL SPEECH RECOGNITION
    Mroueh, Youssef
    Marcheret, Etienne
    Goel, Vaibhava
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2130 - 2134
  • [50] Connectionism based audio-visual speech recognition method
    Che, Na
    Zhu, Yi-Ming
    Zhao, Jian
    Sun, Lei
    Shi, Li-Juan
    Zeng, Xian-Wei
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2024, 54 (10): : 2984 - 2993