Skeleton-weighted and multi-scale temporal-driven network for video action recognition

被引:0
|
作者
Xu, Ziqi [1 ]
Zhang, Jie [2 ,3 ]
Zhang, Peng [2 ,3 ]
Ding, Pengfei [4 ]
机构
[1] Donghua Univ, Coll Comp Sci & Technol, Shanghai, Peoples R China
[2] Minist Educ, Engn Res Ctr Digitalized Textile & Fash Technol, Shanghai, Peoples R China
[3] Donghua Univ, Shanghai Engn Res Ctr Ind Big Data & Intelligent, Inst Artificial Intelligence, Shanghai, Peoples R China
[4] Donghua Univ, Coll Mech Engn, Shanghai, Peoples R China
关键词
video action recognition; multi-model; feature extraction; temporal modeling; feature fusion; RGB;
D O I
10.1117/1.JEI.33.6.063056
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sequential and causal relationships among actions are critical for accurate video interpretation. Therefore, capturing both short-term and long-term temporal information is essential for effective action recognition. Current research, however, primarily focuses on fusing spatial features from diverse modalities for short-term action recognition, inadequately modeling the complex temporal dependencies in videos, leading to suboptimal performance. To address this limitation, we propose a skeleton-weighted and multi-scale temporal-driven action recognition network that integrates RGB and skeleton modalities to effectively capture both short-term and long-term temporal information. First, we propose a temporal-enhanced adaptive graph convolutional network. This network derives motion attention masks from the skeletal joints and transfers them to RGB videos to generate visually salient regions, thereby achieving a concise and effective input representation. Subsequently, we develop a multi-scale local-global temporal modeling network driven by a self-attention mechanism, which effectively captures fine-grained local details of individual actions along with global temporal relationships among actions across multiple temporal resolutions. Moreover, we design a multi-level adaptive temporal scale mixer module that efficiently integrates multi-scale features, creating a unified temporal feature representation to ensure temporal consistency. Finally, we conducted extensive experiments on the NTU-RGBD-60, NTU-RGBD-120, NW-UCLA, and Kinetics datasets to validate the effectiveness of the proposed method. (c) 2024 SPIE and IS&T
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition
    Yang, Huaigang
    Ren, Ziliang
    Yuan, Huaqiang
    Wei, Wenhong
    Zhang, Qieshi
    Zhang, Zhaolong
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [32] SPATIO-TEMPORAL MULTI-SCALE SOFT QUANTIZATION LEARNING FOR SKELETON-BASED HUMAN ACTION RECOGNITION
    Yang, Jianyu
    Zhu, Chen
    Yuan, Junsong
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1078 - 1083
  • [33] Skeleton-Based Action Recognition Using Multi-Scale and Multi-Stream Improved Graph Convolutional Network
    Li, Wang
    Liu, Xu
    Liu, Zheng
    Du, Feixiang
    Zou, Qiang
    IEEE ACCESS, 2020, 8 (08): : 144529 - 144542
  • [34] Multi-Scale Adaptive Skeleton Transformer for action
    Wang, Xiaotian
    Chen, Kai
    Zhao, Zhifu
    Shi, Guangming
    Xie, Xuemei
    Jiang, Xiang
    Yang, Yifan
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 250
  • [35] Multi-Scale Receptive Fields Convolutional Network for Action Recognition
    Dong, Zhiang
    Xie, Miao
    Li, Xiaoqiang
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [36] Skeleton Motion Recognition Based on Multi-Scale Deep Spatio-Temporal Features
    Hu, Kai
    Ding, Yiwu
    Jin, Junlan
    Weng, Liguo
    Xia, Min
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [37] Multi-temporal scale aggregation refinement graph convolutional network for skeleton-based action recognition
    Li, Xuanfeng
    Lu, Jian
    Zhou, Jian
    Liu, Wei
    Zhang, Kaibing
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (01)
  • [38] Multi-scale spatiotemporal topology unveiled: enhancing skeleton-based action recognition
    Chen, Hongwei
    Wang, Jianpeng
    Chen, Zexi
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [39] A convolutional autoencoder model with weighted multi-scale attention modules for 3D skeleton-based action recognition
    Khezerlou, F.
    Baradarani, A.
    Balafar, M. A.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 92
  • [40] A Multi-scale Interaction Motion Network for Action Recognition Based on Capsule Network
    Zheng, Xiangping
    Liang, Xun
    Wu, Bo
    Wang, Jun
    Guo, Yuhui
    Zhang, Xuan
    Mai, Yuefeng
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 505 - 513