A Multi-modal Gesture Recognition System Using Audio, Video, and Skeletal Joint Data

被引：15

作者：

Nandakumar, Karthik ^{[1
]}

Wah, Wan Kong ^{[1
]}

Alice, Chan Siu Man ^{[1
]}

Terence, Ng Wen Zheng ^{[1
]}

Gang, Wang Jian ^{[1
]}

Yun, Yau Wei ^{[1
]}

机构：

[1] ASTAR, I2R, 1 Fusionopolis Way, Singapore, Singapore

来源：

ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2013年

关键词：

Multi-modal gesture recognition; log-energy features; Mel frequency cepstral coefficients (MFCC); Space-Time Interest Points (STIP); covariance descriptor; Hidden Markov Model (HMM); Support Vector Machine (SVM); fusion; NORMALIZATION;

D O I：

10.1145/2522848.2532593

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper describes the gesture recognition system developed by the Institute for Infocomm Research (I2R) for the 2013 ICMI CHALEARN Multi-modal Gesture Recognition Challenge. The proposed system adopts a multi-modal approach for detecting as well as recognizing the gestures. Automated gesture detection is performed using both audio signals and information about hand joints obtained from the Kinect sensor to segment a sample into individual gestures. Once the gestures are detected and segmented, features extracted from three different modalities, namely, audio, 2-dimensional video (RGB), and skeletal joints (Kinect) are used to classify a given sequence of frames into one of the 20 known gestures or an unrecognized gesture. Mel frequency cepstral coefficients (MFCC) are extracted from the audio signals and a Hidden Markov Model (HMM) is used for classification. While Space-Time Interest Points (STIP) are used to represent the RGB modality, a covariance descriptor is extracted from the skeletal joint data. In the case of both RGB and Kinect modalities, Support Vector Machines (SVM) are used for gesture classification. Finally, a fusion scheme is applied to accumulate evidence from all the three modalities and predict the sequence of gestures in each test sample. The proposed gesture recognition system is able to achieve an average edit distance of 0.2074 over the 275 test samples containing 2, 742 unlabeled gestures. While the proposed system is able to recognize the known gestures with high accuracy, most of the errors are caused due to insertion, which occurs when an unrecognized gesture is misclassified as one of the 20 known gestures.

引用

页码：475 / 482

页数：8

共 50 条

[1] Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
GOUTSU Yusuke
KOBAYASHI Takaki
OBARA Junya
KUSAJIMA Ikuo
TAKEICHI Kazunari
TAKANO Wataru
NAKAMURA Yoshihiko
Chinese Journal of Mechanical Engineering, 2015, (04) : 657 - 665
[2] Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
Goutsu, Yusuke
Kobayashi, Takaki
Obara, Junya
Kusajima, Ikuo
Takeichi, Kazunari
Takano, Wataru
Nakamura, Yoshihiko
CHINESE JOURNAL OF MECHANICAL ENGINEERING, 2015, 28 (04) : 657 - 665
[3] Multi-modal gesture recognition using integrated model of motion, audio and video
Yusuke Goutsu
Takaki Kobayashi
Junya Obara
Ikuo Kusajima
Kazunari Takeichi
Wataru Takano
Yoshihiko Nakamura
Chinese Journal of Mechanical Engineering, 2015, 28 : 657 - 665
[4] Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
GOUTSU Yusuke
KOBAYASHI Takaki
OBARA Junya
KUSAJIMA Ikuo
TAKEICHI Kazunari
TAKANO Wataru
NAKAMURA Yoshihiko
Chinese Journal of Mechanical Engineering, 2015, 28 (04) : 657 - 665
[5] Erratum to: Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
GOUTSU Yusuke
KOBAYASHI Takaki
OBARA Junya
KUSAJIMA Ikuo
TAKEICHI Kazunari
TAKANO Wataru
NAKAMURA Yoshihiko
Chinese Journal of Mechanical Engineering, 2017, 30 : 1473 - 1473
[6] A Multi Modal Approach to Gesture Recognition from Audio and Video Data
Bayer, Immanuel
Silbermann, Thierry
ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 461 - 465
[7] Multi-Modal Emotion Recognition Fusing Video and Audio
Xu, Chao
Du, Pufeng
Feng, Zhiyong
Meng, Zhaopeng
Cao, Tianyi
Dong, Caichao
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462
[8] Multi-modal Gesture Recognition Using Skeletal Joints and Motion Trail Model
Liang, Bin
Zheng, Lihong
COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 623 - 638
[9] Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video (vol 28, pg 657, 2015)
Goutsu, Yusuke
Kobayashi, Takaki
Obara, Junya
Kusajima, Ikuo
Takeichi, Kazunari
Takano, Wataru
Nakamura, Yoshihiko
CHINESE JOURNAL OF MECHANICAL ENGINEERING, 2017, 30 (06) : 1473 - 1473
[10] MULTI-MODAL LEARNING FOR GESTURE RECOGNITION
Cao, Congqi
Zhang, Yifan
Lu, Hanqing
2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2015,

← 1 2 3 4 5 →