Combine multi-order representation learning and frame optimization learning for skeleton-based action recognition

被引:0
|
作者
Nong, Liping [1 ,3 ,5 ]
Huang, Zhuocheng [2 ,4 ]
Wang, Junyi [1 ]
Rong, Yanpeng [2 ,4 ]
Peng, Jie [1 ]
Huang, Yiping [2 ,4 ]
机构
[1] Guilin Univ Elect Technol, Sch Informat & Commun, Guilin 541004, Peoples R China
[2] Guangxi Normal Univ, Sch Elect & Informat Engn, Guangxi Key Lab Brain inspired Comp & Intelligent, Guilin 541004, Peoples R China
[3] Guilin Univ Elect Technol, Key Lab Cognit Radio & Informat Proc, Minist Educ, Guilin 541004, Peoples R China
[4] Guangxi Normal Univ, Educ Dept Guangxi Zhuang Autonomous Reg, Key Lab Integrated Circuits & Microsyst, Guilin 541004, Peoples R China
[5] Guangxi Normal Univ, Coll Phys & Technol, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
Skeleton-based action recognition; Graph convolutional network; Hypergraph convolutional network; Frame optimization learning;
D O I
10.1016/j.dsp.2024.104823
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Skeleton-based action recognition has broad application prospects in many fields such as virtual reality. Currently, the most popular way is to employ Graph Convolutional Networks (GCNs) or Hypergraph Convolutional Networks (HGCNs) for this task. However, GCN-based methods may heavily rely on the physical connectivity relationship between joints while lack the capture of higher-order information about interactions among distant joints, and HGCN-based methods usually introduce unnecessary noise when capturing low-order information of skeleton structures with simple topology. Besides, the current methods do not deal well with redundant frames and confusing frames. These limitations hinder the improvement of recognition accuracy. In this paper, we propose a novel network, called Hyper-Net, which combines multi-order representation learning and frame optimization learning for skeleton-based action recognition. Specifically, the proposed Hyper-Net contains Temporal-Channel Aggregation Graph Convolution (TCA-GC), Spatial-Temporal Aggregation Hypergraph Convolution (STA-HC) and Frame Optimization Learning (F-OL) modules. The TCA-GC aggregates low-order and local information from simple joint and bone topologies across different temporal and channel dimensions. The STA-HC captures high- order and global information from complex motion streams as well as solving the problem of spatial-temporal weight imbalance. The F-OL can adaptively extract key frames and distinguish confusing frames, thus improving the ability of the network to recognize confusing actions. A large number of experiments are conducted on the NTU RGB+D, NTU RGB+D 120 and NW-UCLA datasets for action recognition task. Experimental results demonstrate the superiority and effectiveness of the proposed network.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Unsupervised skeleton-based action representation learning via relation consistency pursuit
    Wenjing Zhang
    Yonghong Hou
    Haoyuan Zhang
    Neural Computing and Applications, 2022, 34 : 20327 - 20339
  • [42] Global-local contrastive multiview representation learning for skeleton-based action
    Bian, Cunling
    Feng, Wei
    Meng, Fanbo
    Wang, Song
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
  • [43] Self-Supervised Representation Learning for Skeleton-Based Group Activity Recognition
    Bian, Cunling
    Feng, Wei
    Wang, Song
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5990 - 5998
  • [44] Unsupervised skeleton-based action representation learning via relation consistency pursuit
    Zhang, Wenjing
    Hou, Yonghong
    Zhang, Haoyuan
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (22): : 20327 - 20339
  • [45] A High Invariance Motion Representation for Skeleton-Based Action Recognition
    Guo, Songrui
    Pan, Huawei
    Tan, Guanghua
    Chen, Lin
    Gao, Chunming
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2016, 30 (08)
  • [46] Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition
    Chen, Tailin
    Zhou, Desen
    Wang, Jian
    Wang, Shidong
    Guan, Yu
    He, Xuming
    Ding, Errui
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4334 - 4342
  • [47] Cross-Scale Spatiotemporal Refinement Learning for Skeleton-Based Action Recognition
    Zhang, Yu
    Sun, Zhonghua
    Dai, Meng
    Feng, Jinchao
    Jia, Kebin
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 441 - 445
  • [48] Temporal-masked skeleton-based action recognition with supervised contrastive learning
    Zhao, Zhifeng
    Chen, Guodong
    Lin, Yuxiang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2267 - 2275
  • [49] X-Invariant Contrastive Augmentation and Representation Learning for Semi-Supervised Skeleton-Based Action Recognition
    Xu, Binqian
    Shu, Xiangbo
    Song, Yan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3852 - 3867
  • [50] Feature difference and feature correlation learning mechanism for skeleton-based action recognition
    Qing, Ruxin
    Jiang, Min
    Kong, Jun
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)