Skeleton MixFormer: Multivariate Topology Representation for Skeleton-based Action Recognition

被引:15
|
作者
Xin, Wentian [1 ]
Miao, Qiguang [1 ]
Liu, Yi [1 ]
Liu, Ruyi [1 ]
Pun, Chi-Man [2 ]
Shi, Cheng [3 ]
机构
[1] Xidian Univ, Xian, Peoples R China
[2] Univ Macau, Macau, Peoples R China
[3] Xian Univ Technol, Xian, Peoples R China
基金
国家重点研发计划;
关键词
video understanding; skeleton action recognition; topology representation; transformer; attention;
D O I
10.1145/3581783.3611900
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer, which performs well in various vision tasks, encounters a bottleneck in skeleton-based action recognition and falls short of advanced GCN-based methods. The root cause is that the current skeleton transformer depends on the self-attention mechanism of the complete channel of the global joint, ignoring the highly discriminative differential correlation within the channel, so it is challenging to learn the expression of the multivariate topology dynamically. To tackle this, we present Skeleton MixFormer, an innovative spatio-temporal architecture to effectively represent the physical correlations and temporal interactivity of the compact skeleton data. Two essential components make up the proposed framework: 1) Spatial MixFormer. The channel-grouping and mix-attention are utilized to calculate the dynamic multivariate topological relationships. Compared with the full-channel self-attention method, Spatial MixFormer better highlights the channel groups' discriminative differences and the joint adjacency's interpretable learning. 2) Temporal MixFormer, which consists of Multiscale Convolution, Temporal Transformer and Sequential Holding Module. The multivariate temporal models ensure the richness of global difference expression and realize the discrimination of crucial intervals in the sequence, thereby enabling more effective learning of long and short-term dependencies in actions. Our Skeleton Mix-Former demonstrates state-of-the-art (SOTA) performance across seven different settings on four standard datasets, namely NTU-60, NTU-120, NW-UCLA, and UAV-Human. Related code will be available on Skeleton-MixFormer.
引用
收藏
页码:2211 / 2220
页数:10
相关论文
共 50 条
  • [21] FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation
    Guo, Jingwen
    Liu, Hong
    Sun, Shitong
    Guo, Tianyu
    Zhang, Min
    Si, Chenyang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10366 - 10376
  • [22] Topology-Aware Convolutional Neural Network for Efficient Skeleton-Based Action Recognition
    Xu, Kailin
    Ye, Fanfan
    Zhong, Qiaoyong
    Xie, Di
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2866 - 2874
  • [23] Dynamic spatial-temporal topology graph network for skeleton-based action recognition
    Chen, Lian
    Lu, Ke
    Niu, Zehai
    Wei, Runchen
    Xue, Jian
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [24] Global spatio-temporal synergistic topology learning for skeleton-based action recognition
    Dai, Meng
    Sun, Zhonghua
    Wang, Tianyi
    Feng, Jinchao
    Jia, Kebin
    PATTERN RECOGNITION, 2023, 140
  • [25] Fully Attentional Network for Skeleton-Based Action Recognition
    Liu, Caifeng
    Zhou, Hongcheng
    IEEE ACCESS, 2023, 11 : 20478 - 20485
  • [26] Insight on Attention Modules for Skeleton-Based Action Recognition
    Jiang, Quanyan
    Wu, Xiaojun
    Kittler, Josef
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 242 - 255
  • [27] Skeleton-based action recognition with JRR-GCN
    Ye, Fanfan
    Tang, Huiming
    ELECTRONICS LETTERS, 2019, 55 (17) : 933 - 935
  • [28] Research Progress in Skeleton-Based Human Action Recognition
    Liu B.
    Zhou S.
    Dong J.
    Xie M.
    Zhou S.
    Zheng T.
    Zhang S.
    Ye X.
    Wang X.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (09): : 1299 - 1322
  • [29] Profile HMMs for skeleton-based human action recognition
    Ding, Wenwen
    Liu, Kai
    Fu, Xujia
    Cheng, Fei
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2016, 42 : 109 - 119
  • [30] Skeleton-based action recognition with extreme learning machines
    Chen, Xi
    Koskela, Markus
    NEUROCOMPUTING, 2015, 149 : 387 - 396