Joints-Centered Spatial-Temporal Features Fused Skeleton Convolution Network for Action Recognition

被引:2
|
作者
Song, Wenfeng [1 ]
Chu, Tangli [2 ]
Li, Shuai [3 ]
Li, Nannan [5 ]
Hao, Aimin [2 ,4 ]
Qin, Hong [6 ]
机构
[1] Beijing Informat Sci & Technol Univ, Comp Sch, Beijing 100101, Peoples R China
[2] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
[3] Zhongguancun Lab, Beijing, Peoples R China
[4] Chinese Acad Med Sci, Res Unit Virtual Body & Virtual Surg Technol, 2019RU004, Beijing, Peoples R China
[5] Dalian Maritime Univ, Sch Informat Sci & Technol, Dalian 116024, Peoples R China
[6] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
基金
中国国家自然科学基金;
关键词
Skeleton; Feature extraction; Convolution; Visualization; Task analysis; Joints; Data mining; Skeleton-based action recognition; spatial-temporal feature fusion; PDE diffusion; NEURAL-NETWORKS; GRAPH; REPRESENTATION; DESCRIPTOR; DIFFUSION; FUSION;
D O I
10.1109/TMM.2023.3324835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Skeleton-based action recognition is crucial for natural human-computer interaction, dynamic behavior analysis, and behavior surveillance. The key challenge is to effectively capture the intrinsic local-global clues of the activity. However, it remains challenging to efficiently leverage multidimensional information related to joints' local visual appearances, global spatial relationships, and coherent temporal cues. To address this challenge, we propose a joints-centered spatial-temporal feature-fused framework for action recognition, which exploits skeleton-based graph diffusion and convolution. Specifically, we employ Partial Differential Equation (PDE) based skeleton graph diffusion to automatically activate and diffuse the salient appearance features of joints. This approach simultaneously integrates the joints' appearance clues and their hierarchical relationships at both the super-pixel level and structure level. The diffused appearance-related features of the joints are further fused with skeleton-related spatial-temporal features, and the resulting fused features are fed into a skeleton convolution network for action recognition. Our method was extensively evaluated on two public datasets (NTU-RGBD and UWA3D), and the results demonstrate the improved accuracy and effectiveness of our approach. Our code will be public.
引用
收藏
页码:4602 / 4616
页数:15
相关论文
共 50 条
  • [21] Actionmamba: Action Spatial-Temporal Aggregation Network Based on Mamba and Gcn for Skeleton-Based Action Recognition
    North University of China, School of Electrical and Control Engineering, Shanxi, Taiyuan
    030051, China
  • [22] RGB-Skeleton Fusion Network For Spatial-Temporal Action Detection
    Pan, Binbin
    Wang, Wenzhong
    Luo, Bin
    TWELFTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2020), 2021, 11720
  • [23] Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition
    Cheng, Qin
    Cheng, Jun
    Ren, Ziliang
    Zhang, Qieshi
    Liu, Jianming
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1303 - 1315
  • [24] Online human action recognition with spatial and temporal skeleton features using a distributed camera network
    Liu, Guoliang
    Zhang, Qinghui
    Cao, Yichao
    Tian, Guohui
    Ji, Ze
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (12) : 7389 - 7411
  • [25] Spatial-temporal saliency action mask attention network for action recognition
    Jiang, Min
    Pan, Na
    Kong, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 71
  • [26] Adaptive recognition method of human skeleton action with spatial-temporal tensor fusion
    Jian Z.
    Nan J.
    Liu X.
    Dai W.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2023, 44 (06): : 74 - 85
  • [27] TranSkeleton: Hierarchical Spatial-Temporal Transformer for Skeleton-Based Action Recognition
    Liu, Haowei
    Liu, Yongcheng
    Chen, Yuxin
    Yuan, Chunfeng
    Li, Bing
    Hu, Weiming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4137 - 4148
  • [28] Focal and Global Spatial-Temporal Transformer for Skeleton-Based Action Recognition
    Gao, Zhimin
    Wang, Peitao
    Lv, Pei
    Jiang, Xiaoheng
    Liu, Qidong
    Wang, Pichao
    Xu, Mingliang
    Li, Wanqing
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 155 - 171
  • [29] A Spatial-Temporal Feature Fusion Strategy for Skeleton-Based Action Recognition
    Chen, Yitian
    Xu, Yuchen
    Xie, Qianglai
    Xiong, Lei
    Yao, Leiyue
    2023 INTERNATIONAL CONFERENCE ON DATA SECURITY AND PRIVACY PROTECTION, DSPP, 2023, : 207 - 215
  • [30] Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
    Tian, Xiaoyan
    Jin, Ye
    Zhang, Zhao
    Liu, Peng
    Tang, Xianglong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44273 - 44297