Robot Command Interface Using an Audio-Visual Speech Recognition System

被引:0
|
作者
Ceballos, Alexander [1 ,2 ]
Gomez, Juan [2 ]
Prieto, Flavio [3 ]
Redarce, Tanneguy [4 ]
机构
[1] Inst Tecnol Metropolitano, Medellin, Colombia
[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia
[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia
[4] Inst Natl Sci Appliquees Lyo, Lyon, France
关键词
Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
引用
收藏
页码:869 / +
页数:3
相关论文
共 50 条
  • [31] Audio-visual modeling for bimodal speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Chung, KC
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 181 - 186
  • [32] Bimodal fusion in audio-visual speech recognition
    Zhang, XZ
    Mersereau, RM
    Clements, M
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
  • [33] CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
    Dai, Wenliang
    Cahyawijaya, Samuel
    Yu, Tiezheng
    Barezi, Elham J.
    Xu, Peng
    Yiu, Cheuk Tung Shadow
    Frieske, Rita
    Lovenia, Holy
    Winata, Genta Indra
    Chen, Qifeng
    Ma, Xiaojuan
    Shi, Bertram E.
    Fung, Pascale
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6786 - 6793
  • [34] Recognition of Isolated Digit Using Random Forest for Audio-Visual Speech Recognition
    Prashant Borde
    Sadanand Kulkarni
    Bharti Gawali
    Pravin Yannawar
    Proceedings of the National Academy of Sciences, India Section A: Physical Sciences, 2022, 92 : 103 - 110
  • [35] Recognition of Isolated Digit Using Random Forest for Audio-Visual Speech Recognition
    Borde, Prashant
    Kulkarni, Sadanand
    Gawali, Bharti
    Yannawar, Pravin
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES, 2022, 92 (01) : 103 - 110
  • [36] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
    Abdelaziz, Ahmed Hussen
    Zeiler, Steffen
    Kolossa, Dorothea
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871
  • [37] Lip Tracking Method for the System of Audio-Visual Polish Speech Recognition
    Kubanek, Mariusz
    Bobulski, Janusz
    Adrjanowicz, Lukasz
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2012, 7267 : 535 - 542
  • [38] Audio-visual speech recognition using minimum classification error training
    Miyajima, Chiyomi
    Tokuda, Keiichi
    Kitamura, Tadashi
    Neural Networks for Signal Processing - Proceedings of the IEEE Workshop, 2000, 1 : 3 - 12
  • [39] Audio-visual speech recognition using minimum classification error training
    Miyajima, C
    Tokuda, K
    Kitamura, T
    NEURAL NETWORKS FOR SIGNAL PROCESSING X, VOLS 1 AND 2, PROCEEDINGS, 2000, : 3 - 12
  • [40] Audio-visual speech recognition using red exclusion and neural networks
    Lewis, TW
    Powers, DMW
    JOURNAL OF RESEARCH AND PRACTICE IN INFORMATION TECHNOLOGY, 2003, 35 (01): : 41 - 64