Robust Human Action Recognition Using Global Spatial-Temporal Attention for Human Skeleton Data

被引:0
|
作者
Han, Yun [1 ,2 ]
Chung, Sheng-Luen [1 ]
Ambikapathi, ArulMurugan [3 ]
Chan, Jui-Shan [1 ]
Lin, Wei-You [1 ]
Su, Shun-Feng [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Taipei, Taiwan
[2] Neijiang Normal Univ, Neijiang, Peoples R China
[3] UTECHZONE, Taipei, Taiwan
来源
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2018年
关键词
Human action recognition; global attention model; accumulative learning curve; action recognition; LSTM; spatial-temporal attention;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition from video sequences is one of the most challenging computer vision applications, primarily owing to intrinsic variations in lighting, pose, occlusions, and other factors. The human skeleton joints extracted by the depth camera Kinect have the advantages of simplified structures and rich contents, and are therefore widely used for capturing human actions. However, at present, most of the skeletal joint and Deep learning based action recognition methods treat all skeletal joints equally in both spatial and temporal dimensions. Logically, this is not in accordance with the fact that for different human actions the contributions from skeletal joints could significantly vary spatially and temporally. Incorporating information pertaining to such natural variations will certainly aid in designing a robust human action recognitions system. Hence, in this work, we endeavor to propose a global spatial attention (GSA) model to suitably express the different skeletal joints with different weights so as to provide precise spatial information for human action recognition. Further, we will introduce the notion of accumulative learning curve (ALC) model that can highlight which frames contribute most to the final decision by giving varying temporal weights to each intermediate accumulated learning results provided by an LSTM upon input frames. The proposed GSA (for spatial information) and ALC (for temporal processing) models are integrated into the LSTM framework to construct a robust action recognition framework that takes the human skeletal joints as input and predicts the human action using the enhanced spatial-temporal attention model. Rigorous experiments on NTU datasets (by-far the largest benchmark RGB-D dataset) show that the proposed framework offers the best performance accuracy, least algorithmic complexity and training overheads, when compared with other state-of-the-art human action recognition models.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
    He, Dongliang
    Zhou, Zhichao
    Gan, Chuang
    Li, Fu
    Liu, Xiao
    Li, Yandong
    Wang, Limin
    Wen, Shilei
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8401 - 8408
  • [42] Human Action Recognition by Fusion of Convolutional Neural Networks and spatial-temporal Information
    Li, Weisheng
    Ding, Yahui
    8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 255 - 259
  • [43] TranSkeleton: Hierarchical Spatial-Temporal Transformer for Skeleton-Based Action Recognition
    Liu, Haowei
    Liu, Yongcheng
    Chen, Yuxin
    Yuan, Chunfeng
    Li, Bing
    Hu, Weiming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4137 - 4148
  • [44] A Spatial-Temporal Feature Fusion Strategy for Skeleton-Based Action Recognition
    Chen, Yitian
    Xu, Yuchen
    Xie, Qianglai
    Xiong, Lei
    Yao, Leiyue
    2023 INTERNATIONAL CONFERENCE ON DATA SECURITY AND PRIVACY PROTECTION, DSPP, 2023, : 207 - 215
  • [45] Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition
    Gao, Xuehao
    Yang, Yang
    Wu, Yang
    Du, Shaoyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12130 - 12141
  • [46] A Novel Spatial-Temporal Graph for Skeleton-based Driver Action Recognition
    Li, Peng
    Lu, Meiqi
    Zhang, Zhiwei
    Shan, Donghui
    Yang, Yang
    2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 3243 - 3248
  • [47] Pyramid Spatial-Temporal Graph Transformer for Skeleton-Based Action Recognition
    Chen, Shuo
    Xu, Ke
    Jiang, Xinghao
    Sun, Tanfeng
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [48] Skeleton-based action recognition with local dynamic spatial-temporal aggregation
    Hu, Lianyu
    Liu, Shenglan
    Feng, Wei
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232
  • [49] STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition
    Zhang, Yuhan
    Wu, Bo
    Li, Wen
    Duan, Lixin
    Gan, Chuang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3229 - 3237
  • [50] Spatial-temporal channel-wise attention network for action recognition
    Chen, Lin
    Liu, Yungang
    Man, Yongchao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (14) : 21789 - 21808