A Multi-modal Gesture Recognition System Using Audio, Video, and Skeletal Joint Data

被引:15
|
作者
Nandakumar, Karthik [1 ]
Wah, Wan Kong [1 ]
Alice, Chan Siu Man [1 ]
Terence, Ng Wen Zheng [1 ]
Gang, Wang Jian [1 ]
Yun, Yau Wei [1 ]
机构
[1] ASTAR, I2R, 1 Fusionopolis Way, Singapore, Singapore
关键词
Multi-modal gesture recognition; log-energy features; Mel frequency cepstral coefficients (MFCC); Space-Time Interest Points (STIP); covariance descriptor; Hidden Markov Model (HMM); Support Vector Machine (SVM); fusion; NORMALIZATION;
D O I
10.1145/2522848.2532593
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes the gesture recognition system developed by the Institute for Infocomm Research (I2R) for the 2013 ICMI CHALEARN Multi-modal Gesture Recognition Challenge. The proposed system adopts a multi-modal approach for detecting as well as recognizing the gestures. Automated gesture detection is performed using both audio signals and information about hand joints obtained from the Kinect sensor to segment a sample into individual gestures. Once the gestures are detected and segmented, features extracted from three different modalities, namely, audio, 2-dimensional video (RGB), and skeletal joints (Kinect) are used to classify a given sequence of frames into one of the 20 known gestures or an unrecognized gesture. Mel frequency cepstral coefficients (MFCC) are extracted from the audio signals and a Hidden Markov Model (HMM) is used for classification. While Space-Time Interest Points (STIP) are used to represent the RGB modality, a covariance descriptor is extracted from the skeletal joint data. In the case of both RGB and Kinect modalities, Support Vector Machines (SVM) are used for gesture classification. Finally, a fusion scheme is applied to accumulate evidence from all the three modalities and predict the sequence of gestures in each test sample. The proposed gesture recognition system is able to achieve an average edit distance of 0.2074 over the 275 test samples containing 2, 742 unlabeled gestures. While the proposed system is able to recognize the known gestures with high accuracy, most of the errors are caused due to insertion, which occurs when an unrecognized gesture is misclassified as one of the 20 known gestures.
引用
收藏
页码:475 / 482
页数:8
相关论文
共 50 条
  • [31] Bayesian Co-Boosting for Multi-modal Gesture Recognition
    Wu, Jiaxiang
    Cheng, Jian
    JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 3013 - 3036
  • [32] Bayesian co-boosting for multi-modal gesture recognition
    Wu, Jiaxiang
    Cheng, Jian
    Journal of Machine Learning Research, 2014, 15 : 3013 - 3036
  • [33] Multi-modal Gesture Recognition Challenge 2013: Dataset and Results
    Escalera, Sergio
    Gonzalez, Jordi
    Baro, Xavier
    Reyes, Miguel
    Lopes, Oscar
    Guyon, Isabelle
    Athitsos, Vassilis
    Escalante, Hugo J.
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 445 - 452
  • [34] Mudra: A Multi-Modal Smartwatch Interactive System with Hand Gesture Recognition and User Identification
    Guo, Kaiwen
    Zhou, Hao
    Tian, Ye
    Zhou, Wangqiu
    Ji, Yusheng
    Li, Xiang-Yang
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 100 - 109
  • [35] Nonparametric Feature Matching Based Conditional Random Fields for Gesture Recognition from Multi-Modal Video
    Chang, Ju Yong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) : 1612 - 1625
  • [36] An enhanced artificial neural network for hand gesture recognition using multi-modal features
    Uke, Shailaja N.
    Zade, Amol V.
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (06): : 2278 - 2289
  • [37] Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
    Fan, Dinghao
    Lu, Hengjie
    Xu, Shugong
    Cao, Shan
    IEEE SENSORS JOURNAL, 2021, 21 (23) : 27026 - 27036
  • [38] A comprehensive video dataset for multi-modal recognition systems
    Handa A.
    Agarwal R.
    Kohli N.
    Data Science Journal, 2019, 18 (01):
  • [39] A Multi-modal System for Video Semantic Understanding
    Lv, Zhengwei
    Lei, Tao
    Liang, Xiao
    Shi, Zhizhong
    Liu, Duoxing
    CCKS 2021 - EVALUATION TRACK, 2022, 1553 : 34 - 43
  • [40] Adaptive cross-fusion learning for multi-modal gesture recognition
    Zhou, Benjia
    Wan, Jun
    Liang, Yanyan
    Guo, Guodong
    Virtual Reality and Intelligent Hardware, 2021, 3 (03): : 235 - 247