3D-STARNET: Spatial-Temporal Attention Residual Network for Robust Action Recognition

被引:0
|
作者
Yang, Jun [1 ,2 ]
Sun, Shulong [2 ]
Chen, Jiayue [1 ]
Xie, Haizhen [1 ]
Wang, Yan [1 ]
Yang, Zenglong [1 ]
机构
[1] China Univ Min & Technol, Big Data & Internet Things Res Ctr, Beijing 100083, Peoples R China
[2] Minist Emergency Management, Key Lab Intelligent Min & Robot, Beijing 100083, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 16期
基金
中国国家自然科学基金;
关键词
action recognition; spatiotemporal attention; multi-staged residual; skeleton; 3D CNN;
D O I
10.3390/app14167154
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial-Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model significantly improves the performance of action recognition through the following three main innovations: (1) the conversion from skeleton points to heat maps. Using Gaussian transform to convert skeleton point data into heat maps effectively reduces the model's strong dependence on the original skeleton point data and enhances the stability and robustness of the data; (2) a spatiotemporal attention mechanism (STA). A novel spatiotemporal attention mechanism is proposed, focusing on the extraction of key frames and key areas within frames, which significantly enhances the model's ability to identify behavioral patterns; (3) a multi-stage residual structure (MS-Residual). The introduction of a multi-stage residual structure improves the efficiency of data transmission in the network, solves the gradient vanishing problem in deep networks, and helps to improve the recognition efficiency of the model. Experimental results on the NTU-RGBD120 dataset show that 3D-STARNET has significantly improved the accuracy of action recognition, and the top1 accuracy of the overall network reached 96.74%. This method not only solves the robustness shortcomings of existing methods, but also improves the ability to capture spatiotemporal features, providing an efficient and widely applicable solution for action recognition based on skeletal data.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition
    Liu, Quanle
    Che, Xiangjiu
    Bie, Mei
    IEEE ACCESS, 2019, 7 : 82246 - 82255
  • [2] Spatial-Temporal Convolutional Attention Network for Action Recognition
    Luo, Huilan
    Chen, Han
    Computer Engineering and Applications, 2023, 59 (09): : 150 - 158
  • [3] Spatial-Temporal Attention for Action Recognition
    Sun, Dengdi
    Wu, Hanqing
    Ding, Zhuanlian
    Luo, Bin
    Tang, Jin
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 854 - 864
  • [4] Spatial-temporal saliency action mask attention network for action recognition
    Jiang, Min
    Pan, Na
    Kong, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
  • [5] Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
    Du, Wenbin
    Wang, Yali
    Qiao, Yu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1347 - 1360
  • [6] Joint spatial-temporal attention for action recognition
    Yu, Tingzhao
    Guo, Chaoxu
    Wang, Lingfeng
    Gu, Huxiang
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION LETTERS, 2018, 112 : 226 - 233
  • [7] Spatial-temporal channel-wise attention network for action recognition
    Chen, Lin
    Liu, Yungang
    Man, Yongchao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (14) : 21789 - 21808
  • [8] Recurrent attention network using spatial-temporal relations for action recognition
    Zhang, Mingxing
    Yang, Yang
    Ji, Yanli
    Xie, Ning
    Shen, Fumin
    SIGNAL PROCESSING, 2018, 145 : 137 - 145
  • [9] Spatial-temporal channel-wise attention network for action recognition
    Lin Chen
    Yungang Liu
    Yongchao Man
    Multimedia Tools and Applications, 2021, 80 : 21789 - 21808
  • [10] STARNeT: Multidimensional spatial-temporal attention recall network for accurate encrypted traffic classification
    Guan, Xinjie
    Zhu, Shuyan
    Wan, Xili
    Wu, Yaping
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2025, 236