3D-STARNET: Spatial-Temporal Attention Residual Network for Robust Action Recognition

被引:0
|
作者
Yang, Jun [1 ,2 ]
Sun, Shulong [2 ]
Chen, Jiayue [1 ]
Xie, Haizhen [1 ]
Wang, Yan [1 ]
Yang, Zenglong [1 ]
机构
[1] China Univ Min & Technol, Big Data & Internet Things Res Ctr, Beijing 100083, Peoples R China
[2] Minist Emergency Management, Key Lab Intelligent Min & Robot, Beijing 100083, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 16期
基金
中国国家自然科学基金;
关键词
action recognition; spatiotemporal attention; multi-staged residual; skeleton; 3D CNN;
D O I
10.3390/app14167154
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial-Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model significantly improves the performance of action recognition through the following three main innovations: (1) the conversion from skeleton points to heat maps. Using Gaussian transform to convert skeleton point data into heat maps effectively reduces the model's strong dependence on the original skeleton point data and enhances the stability and robustness of the data; (2) a spatiotemporal attention mechanism (STA). A novel spatiotemporal attention mechanism is proposed, focusing on the extraction of key frames and key areas within frames, which significantly enhances the model's ability to identify behavioral patterns; (3) a multi-stage residual structure (MS-Residual). The introduction of a multi-stage residual structure improves the efficiency of data transmission in the network, solves the gradient vanishing problem in deep networks, and helps to improve the recognition efficiency of the model. Experimental results on the NTU-RGBD120 dataset show that 3D-STARNET has significantly improved the accuracy of action recognition, and the top1 accuracy of the overall network reached 96.74%. This method not only solves the robustness shortcomings of existing methods, but also improves the ability to capture spatiotemporal features, providing an efficient and widely applicable solution for action recognition based on skeletal data.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Multi-granular spatial-temporal synchronous graph convolutional network for robust action recognition
    Li, Chang
    Huang, Qian
    Mao, Yingchi
    Li, Xing
    Wu, Jie
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 257
  • [22] STAP: Spatial-Temporal Attention-Aware Pooling for Action Recognition
    Nguyen, Tam V.
    Song, Zheng
    Yan, Shuicheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (01) : 77 - 86
  • [23] Multi-Branch Spatial-Temporal Network for Action Recognition
    Wang, Yingying
    Li, Wei
    Tao, Ran
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1556 - 1560
  • [24] Action Recognition Using a Spatial-Temporal Network for Wild Felines
    Feng, Liqi
    Zhao, Yaqin
    Sun, Yichao
    Zhao, Wenxuan
    Tang, Jiaxi
    ANIMALS, 2021, 11 (02): : 1 - 18
  • [25] Spatial-Temporal Attention Network for Depression Recognition from facial videos
    Pan, Yuchen
    Shang, Yuanyuan
    Liu, Tie
    Shao, Zhuhong
    Guo, Guodong
    Ding, Hui
    Hu, Qiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [26] Separable 3D residual attention network for human action recognition
    Zhang, Zufan
    Peng, Yue
    Gan, Chenquan
    Abate, Andrea Francesco
    Zhu, Lianxiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (04) : 5435 - 5453
  • [27] Separable 3D residual attention network for human action recognition
    Zufan Zhang
    Yue Peng
    Chenquan Gan
    Andrea Francesco Abate
    Lianxiang Zhu
    Multimedia Tools and Applications, 2023, 82 : 5435 - 5453
  • [28] Exploiting Attention-Consistency Loss For Spatial-Temporal Stream Action Recognition
    Xu, Haotian
    Jin, Xiaobo
    Wang, Qiufeng
    Hussain, Amir
    Huang, Kaizhu
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
  • [29] Bi-direction hierarchical LSTM with spatial-temporal attention for action recognition
    Yang, Haodong
    Zhang, Jun
    Li, Shuohao
    Luo, Tingjin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (01) : 775 - 786
  • [30] Spatial-temporal graph attention networks for skeleton-based action recognition
    Huang, Qingqing
    Zhou, Fengyu
    He, Jiakai
    Zhao, Yang
    Qin, Runze
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (05)