A Multi-modal Gesture Recognition System Using Audio, Video, and Skeletal Joint Data

被引:15
|
作者
Nandakumar, Karthik [1 ]
Wah, Wan Kong [1 ]
Alice, Chan Siu Man [1 ]
Terence, Ng Wen Zheng [1 ]
Gang, Wang Jian [1 ]
Yun, Yau Wei [1 ]
机构
[1] ASTAR, I2R, 1 Fusionopolis Way, Singapore, Singapore
关键词
Multi-modal gesture recognition; log-energy features; Mel frequency cepstral coefficients (MFCC); Space-Time Interest Points (STIP); covariance descriptor; Hidden Markov Model (HMM); Support Vector Machine (SVM); fusion; NORMALIZATION;
D O I
10.1145/2522848.2532593
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes the gesture recognition system developed by the Institute for Infocomm Research (I2R) for the 2013 ICMI CHALEARN Multi-modal Gesture Recognition Challenge. The proposed system adopts a multi-modal approach for detecting as well as recognizing the gestures. Automated gesture detection is performed using both audio signals and information about hand joints obtained from the Kinect sensor to segment a sample into individual gestures. Once the gestures are detected and segmented, features extracted from three different modalities, namely, audio, 2-dimensional video (RGB), and skeletal joints (Kinect) are used to classify a given sequence of frames into one of the 20 known gestures or an unrecognized gesture. Mel frequency cepstral coefficients (MFCC) are extracted from the audio signals and a Hidden Markov Model (HMM) is used for classification. While Space-Time Interest Points (STIP) are used to represent the RGB modality, a covariance descriptor is extracted from the skeletal joint data. In the case of both RGB and Kinect modalities, Support Vector Machines (SVM) are used for gesture classification. Finally, a fusion scheme is applied to accumulate evidence from all the three modalities and predict the sequence of gestures in each test sample. The proposed gesture recognition system is able to achieve an average edit distance of 0.2074 over the 275 test samples containing 2, 742 unlabeled gestures. While the proposed system is able to recognize the known gestures with high accuracy, most of the errors are caused due to insertion, which occurs when an unrecognized gesture is misclassified as one of the 20 known gestures.
引用
收藏
页码:475 / 482
页数:8
相关论文
共 50 条
  • [41] Multi-modal zero-shot dynamic hand gesture recognition
    Rastgoo, Razieh
    Kiani, Kourosh
    Escalera, Sergio
    Sabokrou, Mohammad
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [42] Adaptive cross-fusion learning for multi-modal gesture recognition
    Benjia ZHOU
    Jun WAN
    Yanyan LIANG
    Guodong GUO
    虚拟现实与智能硬件(中英文), 2021, 3 (03) : 235 - 247
  • [43] Calibration of audio-video sensors for multi-modal event indexing
    Kuehnapfel, Thorsten
    Tan, Tele
    Venkatesh, Svetha
    Lehmann, Eric
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 741 - +
  • [44] Dynamic Hand Gesture Recognition from Multi-modal Streams Using Deep Neural Network
    Thanh-Hai Tran
    Hoang-Nhat Tran
    Huong-Giang Doan
    MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, 2019, 11909 : 156 - 167
  • [45] Multi-modal audio-visual event recognition for football analysis
    Barnard, M
    Odobez, JM
    Bengio, S
    2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 469 - 478
  • [46] Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition
    Yu, Zitong
    Zhou, Benjia
    Wan, Jun
    Wang, Pichao
    Chen, Haoyu
    Liu, Xin
    Li, Stan Z.
    Zhao, Guoying
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5626 - 5640
  • [47] AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
    Panda, Rameswar
    Chen, Chun-Fu
    Fan, Quanfu
    Sun, Ximeng
    Saenko, Kate
    Oliva, Aude
    Feris, Rogerio
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7556 - 7565
  • [48] Gesture recognition with hidden Markov models to enable multi-modal haptic feedback
    Frolov, Vadim
    Deml, Barbara
    Hannig, Gunter
    HAPTICS: PERCEPTION, DEVICES AND SCENARIOS, PROCEEDINGS, 2008, 5024 : 786 - +
  • [49] Multi-modal user interface combining eye tracking and hand gesture recognition
    Kim, Hansol
    Suh, Kun Ha
    Lee, Eui Chul
    JOURNAL ON MULTIMODAL USER INTERFACES, 2017, 11 (03) : 241 - 250
  • [50] Multi-modal user interaction method based on gaze tracking and gesture recognition
    Lee, Heekyung
    Lim, Seong Yong
    Lee, Injae
    Cha, Jihun
    Cho, Dong-Chan
    Cho, Sunyoung
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2013, 28 (02) : 114 - 126