Skeleton MixFormer: Multivariate Topology Representation for Skeleton-based Action Recognition

被引:15
|
作者
Xin, Wentian [1 ]
Miao, Qiguang [1 ]
Liu, Yi [1 ]
Liu, Ruyi [1 ]
Pun, Chi-Man [2 ]
Shi, Cheng [3 ]
机构
[1] Xidian Univ, Xian, Peoples R China
[2] Univ Macau, Macau, Peoples R China
[3] Xian Univ Technol, Xian, Peoples R China
基金
国家重点研发计划;
关键词
video understanding; skeleton action recognition; topology representation; transformer; attention;
D O I
10.1145/3581783.3611900
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer, which performs well in various vision tasks, encounters a bottleneck in skeleton-based action recognition and falls short of advanced GCN-based methods. The root cause is that the current skeleton transformer depends on the self-attention mechanism of the complete channel of the global joint, ignoring the highly discriminative differential correlation within the channel, so it is challenging to learn the expression of the multivariate topology dynamically. To tackle this, we present Skeleton MixFormer, an innovative spatio-temporal architecture to effectively represent the physical correlations and temporal interactivity of the compact skeleton data. Two essential components make up the proposed framework: 1) Spatial MixFormer. The channel-grouping and mix-attention are utilized to calculate the dynamic multivariate topological relationships. Compared with the full-channel self-attention method, Spatial MixFormer better highlights the channel groups' discriminative differences and the joint adjacency's interpretable learning. 2) Temporal MixFormer, which consists of Multiscale Convolution, Temporal Transformer and Sequential Holding Module. The multivariate temporal models ensure the richness of global difference expression and realize the discrimination of crucial intervals in the sequence, thereby enabling more effective learning of long and short-term dependencies in actions. Our Skeleton Mix-Former demonstrates state-of-the-art (SOTA) performance across seven different settings on four standard datasets, namely NTU-60, NTU-120, NW-UCLA, and UAV-Human. Related code will be available on Skeleton-MixFormer.
引用
收藏
页码:2211 / 2220
页数:10
相关论文
共 50 条
  • [41] SKELETON-BASED ACTION RECOGNITION WITH CONVOLUTIONAL NEURAL NETWORKS
    Li, Chao
    Zhong, Qiaoyong
    Xie, Di
    Pu, Shiliang
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2017,
  • [42] A Spatiotemporal Fusion Network For Skeleton-Based Action Recognition
    Bao, Wenxia
    Wang, Junyi
    Yang, Xianjun
    Chen, Hemu
    2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 347 - 352
  • [43] Memory Attention Networks for Skeleton-Based Action Recognition
    Li, Ce
    Xie, Chunyu
    Zhang, Baochang
    Han, Jungong
    Zhen, Xiantong
    Chen, Jie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 4800 - 4814
  • [44] SkeleTR: Towards Skeleton-based Action Recognition in the Wild
    Duan, Haodong
    Xu, Mingze
    Shuai, Bing
    Modolo, Davide
    Tu, Zhuowen
    Tighe, Joseph
    Bergamo, Alessandro
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13588 - 13598
  • [45] EnsCLR: Unsupervised skeleton-based action recognition via ensemble contrastive learning of representation
    Wang, Kun
    Cao, Jiuxin
    Cao, Biwei
    Liu, Bo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 247
  • [46] Skeleton-Based Mutual Action Recognition Using Interactive Skeleton Graph and Joint Attention
    Jia, Xiangze
    Zhang, Ji
    Wang, Zhen
    Luo, Yonglong
    Chen, Fulong
    Yang, Gaoming
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT II, 2022, 13427 : 110 - 116
  • [47] Enhanced view-independent representation method for skeleton-based human action recognition
    Jiang Y.
    Lu L.
    Xu J.
    International Journal of Information and Communication Technology, 2021, 19 (02) : 201 - 218
  • [48] Decoupled Representation Network for Skeleton-Based Hand Gesture Recognition
    Zhong, Zhaochao
    Li, Yangke
    Yang, Jifang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 469 - 480
  • [49] Hierarchical Contrast for Unsupervised Skeleton-Based Action Representation Learning
    Dong, Jianfeng
    Sun, Shengkai
    Liu, Zhonglin
    Chen, Shujie
    Liu, Baolong
    Wang, Xun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 525 - 533
  • [50] Skeleton-Based Action Recognition with Combined Part-Wise Topology Graph Convolutional Networks
    Zhu, Xiaowei
    Huang, Qian
    Li, Chang
    Cui, Jingwen
    Chen, Yingying
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 43 - 59