Separable 3D residual attention network for human action recognition

被引:1
|
作者
Zhang, Zufan [1 ]
Peng, Yue [1 ]
Gan, Chenquan [1 ]
Abate, Andrea Francesco [2 ]
Zhu, Lianxiang [3 ]
机构
[1] Chongqing Univ Posts & Telecommun, Sch Commun & Informat Engn, Chongqing 400065, Peoples R China
[2] Univ Salerno, Dept Comp Sci, Via Giovanni Paolo II 132, I-84084 Fisciano, SA, Italy
[3] Xian Shiyou Univ, Sch Comp Sci, Xian 710065, Peoples R China
关键词
Human computer interaction; Human action recognition; Residual network; Attention mechanism; Multi-stage training strategy; SPATIAL-TEMPORAL ATTENTION; FEATURES; LSTM;
D O I
10.1007/s11042-022-12972-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As an important research issue in computer vision, human action recognition has been regarded as a crucial mean of communication and interaction between humans and computers. To help computers automatically recognize human behaviors and accurately understand human intentions, this paper proposes a separable three-dimensional residual attention network (defined as Sep-3D RAN), which is a lightweight network and can extract the informative spatial-temporal representations for the applications of video-based human computer interaction. Specifically, Sep-3D RAN is constructed via stacking multiple separable three-dimensional residual attention blocks, in which each standard three-dimensional convolution is approximated as a cascaded two-dimensional spatial convolution and a one-dimensional temporal convolution, and then a dual attention mechanism is built by embedding a channel attention sub-module and a spatial attention sub-module sequentially in each residual block, thereby acquiring more discriminative features to improve the model guidance capability. Furthermore, a multi-stage training strategy is used for Sep-3D RAN training, which can relieve the over-fitting effectively. Finally, experimental results demonstrate that the performance of Sep-3D RAN can surpass the existing state-of-the-art methods.
引用
收藏
页码:5435 / 5453
页数:19
相关论文
共 50 条
  • [21] Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer
    Liu, Minghua
    Li, Wenjing
    He, Bo
    Wang, Chuanxu
    Qu, Lianen
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [22] 3D CNN for Human Action Recognition
    Boualia, Sameh Neili
    Ben Amara, Najoua Essoukri
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 276 - 282
  • [23] 3D residual attention network for hyperspectral image classification
    Li, Huizhen
    Wei, Kanghui
    Zhang, Bengong
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2023, 21 (04)
  • [24] 3D Contextual Transformer & Double Inception Network for Human Action Recognition
    Liu, Enqi
    Hirota, Kaoru
    Liu, Chang
    Dai, Yaping
    Proceedings of the 35th Chinese Control and Decision Conference, CCDC 2023, 2023, : 1795 - 1800
  • [25] 3D Contextual Transformer & Double Inception Network for Human Action Recognition
    Liu, Enqi
    Hirota, Kaoru
    Liu, Chang
    Dai, Yaping
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 1795 - 1800
  • [26] Res3ATN-Deep 3D Residual Attention Network for Hand Gesture Recognition in Videos
    Dhingra, Naina
    Kunz, Andreas
    2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 491 - 501
  • [27] 3D Convolutional Neural Network for Action Recognition
    Zhang, Junhui
    Chen, Li
    Tian, Jing
    COMPUTER VISION, PT I, 2017, 771 : 600 - 607
  • [28] Action recognition with motion map 3D network
    Sun, Yuchao
    Wu, Xinxiao
    Yu, Wennan
    Yu, Feiwu
    NEUROCOMPUTING, 2018, 297 : 33 - 39
  • [29] Recognition of Plasma-Treated Rice Based on 3D Deep Residual Network with Attention Mechanism
    Tang, Xiaojiang
    Zhao, Wenhao
    Guo, Junwei
    Li, Baoxia
    Liu, Xin
    Wang, Yuan
    Huang, Feng
    MATHEMATICS, 2023, 11 (07)
  • [30] A novel 3D shape recognition method based on double-channel attention residual network
    Ziping Ma
    Jie Zhou
    Jinlin Ma
    Tingting Li
    Multimedia Tools and Applications, 2022, 81 : 32519 - 32548