Robot Command Interface Using an Audio-Visual Speech Recognition System

被引：0

作者：

Ceballos, Alexander ^{[1
,2
]}

Gomez, Juan ^{[2
]}

Prieto, Flavio ^{[3
]}

Redarce, Tanneguy ^{[4
]}

机构：

[1] Inst Tecnol Metropolitano, Medellin, Colombia

[2] Univ Nacional Colombia Sede Manizales, DIEEC, Manizales, Colombia

[3] Univ Nacional Colombia Sede Bogota, DIMM, Bogota, Colombia

[4] Inst Natl Sci Appliquees Lyo, Lyon, France

来源：

PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS | 2009年 / 5856卷

关键词：

Speech recognition; MPEG-4; manipulator; LAPAROSCOPIC SURGERY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

引用

页码：869 / +

页数：3

共 50 条

[21] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
Su, Rongfeng
Wang, Lan
Liu, Xunying
2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
[22] Lip movement synthesis in audio-visual speech recognition system
Li, Junquan
Yin, Yixin
Proc. 2005 IEEE Int. Conf. on Lang. Process. Knowl. Engin. IEEE NLP-KE '05, (461-465):
[23] Audio-Visual Robot Command Recognition D-META'12 Grand Challenge
Sanchez-Riera, Jordi
Alameda-Pineda, Xavier
Horaud, Radu
ICMI '12: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2012, : 371 - 378
[24] Audio-Visual Speech Modeling for Continuous Speech Recognition
Dupont, Stephane
Luettin, Juergen
IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
[25] Enhancing Quality and Accuracy of Speech Recognition System by Using Multimodal Audio-Visual Speech signal
El Maghraby, Eslam E.
Gody, Amr M.
Farouk, M. Hesham
ICENCO 2016 - 2016 12TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO) - BOUNDLESS SMART SOCIETIES, 2016, : 219 - 229
[26] Large vocabulary audio-visual speech recognition using the Janus speech recognition toolkit
Kratt, J
Metze, F
Stiefelhagen, R
Waibel, A
PATTERN RECOGNITION, 2004, 3175 : 488 - 495
[27] Audio-Visual Speech Recognition for Human-Robot Interaction: a Feasibility Study
Goetzee, Sander
Mihhailov, Konstantin
van de laar, Roel
Baraka, Kim
Hindriks, Koen V.
2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, 2024, : 930 - 935
[28] Speaker independent audio-visual speech recognition
Zhang, Y
Levinson, S
Huang, T
2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076
[29] A coupled HMM for audio-visual speech recognition
Nefian, AV
Liang, LH
Pi, XB
Xiaoxiang, L
Mao, C
Murphy, K
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
[30] An asynchronous DBN for audio-visual speech recognition
Saenko, Kate
Livescu, Karen
2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +

← 1 2 3 4 5 →