Robust Human Action Recognition Using Global Spatial-Temporal Attention for Human Skeleton Data

被引:0
|
作者
Han, Yun [1 ,2 ]
Chung, Sheng-Luen [1 ]
Ambikapathi, ArulMurugan [3 ]
Chan, Jui-Shan [1 ]
Lin, Wei-You [1 ]
Su, Shun-Feng [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Taipei, Taiwan
[2] Neijiang Normal Univ, Neijiang, Peoples R China
[3] UTECHZONE, Taipei, Taiwan
关键词
Human action recognition; global attention model; accumulative learning curve; action recognition; LSTM; spatial-temporal attention;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition from video sequences is one of the most challenging computer vision applications, primarily owing to intrinsic variations in lighting, pose, occlusions, and other factors. The human skeleton joints extracted by the depth camera Kinect have the advantages of simplified structures and rich contents, and are therefore widely used for capturing human actions. However, at present, most of the skeletal joint and Deep learning based action recognition methods treat all skeletal joints equally in both spatial and temporal dimensions. Logically, this is not in accordance with the fact that for different human actions the contributions from skeletal joints could significantly vary spatially and temporally. Incorporating information pertaining to such natural variations will certainly aid in designing a robust human action recognitions system. Hence, in this work, we endeavor to propose a global spatial attention (GSA) model to suitably express the different skeletal joints with different weights so as to provide precise spatial information for human action recognition. Further, we will introduce the notion of accumulative learning curve (ALC) model that can highlight which frames contribute most to the final decision by giving varying temporal weights to each intermediate accumulated learning results provided by an LSTM upon input frames. The proposed GSA (for spatial information) and ALC (for temporal processing) models are integrated into the LSTM framework to construct a robust action recognition framework that takes the human skeletal joints as input and predicts the human action using the enhanced spatial-temporal attention model. Rigorous experiments on NTU datasets (by-far the largest benchmark RGB-D dataset) show that the proposed framework offers the best performance accuracy, least algorithmic complexity and training overheads, when compared with other state-of-the-art human action recognition models.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Adaptive recognition method of human skeleton action with spatial-temporal tensor fusion
    Jian Z.
    Nan J.
    Liu X.
    Dai W.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2023, 44 (06): : 74 - 85
  • [2] Spatial-Temporal Attention for Action Recognition
    Sun, Dengdi
    Wu, Hanqing
    Ding, Zhuanlian
    Luo, Bin
    Tang, Jin
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 854 - 864
  • [3] Spatial-temporal graph attention networks for skeleton-based action recognition
    Huang, Qingqing
    Zhou, Fengyu
    He, Jiakai
    Zhao, Yang
    Qin, Runze
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (05)
  • [4] SPATIAL-TEMPORAL DATA AUGMENTATION BASED ON LSTM AUTOENCODER NETWORK FOR SKELETON-BASED HUMAN ACTION RECOGNITION
    Tu, Juanhui
    Liu, Hong
    Meng, Fanyang
    Liu, Mengyuan
    Ding, Runwei
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3478 - 3482
  • [5] Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction Recognition
    Liu, Xing
    Gao, Bo
    IEEE ACCESS, 2025, 13 : 6463 - 6474
  • [6] Focal and Global Spatial-Temporal Transformer for Skeleton-Based Action Recognition
    Gao, Zhimin
    Wang, Peitao
    Lv, Pei
    Jiang, Xiaoheng
    Liu, Qidong
    Wang, Pichao
    Xu, Mingliang
    Li, Wanqing
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 155 - 171
  • [7] Local and Global Spatial-Temporal Transformer for skeleton-based action recognition
    Liu, Ruyi
    Chen, Yu
    Gai, Feiyu
    Liu, Yi
    Miao, Qiguang
    Wu, Shuai
    NEUROCOMPUTING, 2025, 634
  • [8] Global Spatio-Temporal Attention for Action Recognition Based on 3D Human Skeleton Data
    Han, Yun
    Chung, Sheng-Luen
    Xiao, Qiang
    Lin, Wei You
    Su, Shun-Feng
    IEEE ACCESS, 2020, 8 : 88604 - 88616
  • [9] Joint spatial-temporal attention for action recognition
    Yu, Tingzhao
    Guo, Chaoxu
    Wang, Lingfeng
    Gu, Huxiang
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION LETTERS, 2018, 112 : 226 - 233
  • [10] 3D HUMAN ACTION RECOGNITION BASED ON THE SPATIAL-TEMPORAL MOVING SKELETON DESCRIPTOR
    Yao, Hongxian
    Jiang, Xinghao
    Sun, Tanfeng
    Wang, Shilin
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 937 - 942