Skeleton MixFormer: Multivariate Topology Representation for Skeleton-based Action Recognition

被引:15
|
作者
Xin, Wentian [1 ]
Miao, Qiguang [1 ]
Liu, Yi [1 ]
Liu, Ruyi [1 ]
Pun, Chi-Man [2 ]
Shi, Cheng [3 ]
机构
[1] Xidian Univ, Xian, Peoples R China
[2] Univ Macau, Macau, Peoples R China
[3] Xian Univ Technol, Xian, Peoples R China
基金
国家重点研发计划;
关键词
video understanding; skeleton action recognition; topology representation; transformer; attention;
D O I
10.1145/3581783.3611900
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer, which performs well in various vision tasks, encounters a bottleneck in skeleton-based action recognition and falls short of advanced GCN-based methods. The root cause is that the current skeleton transformer depends on the self-attention mechanism of the complete channel of the global joint, ignoring the highly discriminative differential correlation within the channel, so it is challenging to learn the expression of the multivariate topology dynamically. To tackle this, we present Skeleton MixFormer, an innovative spatio-temporal architecture to effectively represent the physical correlations and temporal interactivity of the compact skeleton data. Two essential components make up the proposed framework: 1) Spatial MixFormer. The channel-grouping and mix-attention are utilized to calculate the dynamic multivariate topological relationships. Compared with the full-channel self-attention method, Spatial MixFormer better highlights the channel groups' discriminative differences and the joint adjacency's interpretable learning. 2) Temporal MixFormer, which consists of Multiscale Convolution, Temporal Transformer and Sequential Holding Module. The multivariate temporal models ensure the richness of global difference expression and realize the discrimination of crucial intervals in the sequence, thereby enabling more effective learning of long and short-term dependencies in actions. Our Skeleton Mix-Former demonstrates state-of-the-art (SOTA) performance across seven different settings on four standard datasets, namely NTU-60, NTU-120, NW-UCLA, and UAV-Human. Related code will be available on Skeleton-MixFormer.
引用
收藏
页码:2211 / 2220
页数:10
相关论文
共 50 条
  • [31] Temporal Extension Module for Skeleton-Based Action Recognition
    Obinata, Yuya
    Yamamoto, Takuma
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 534 - 540
  • [32] Adversarial Attack on Skeleton-Based Human Action Recognition
    Liu, Jian
    Akhtar, Naveed
    Mian, Ajmal
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1609 - 1622
  • [33] Skeleton-based Action Recognition with Graph Involution Network
    Tang, Zhihao
    Xia, Hailun
    Gao, Xinkai
    Gao, Feng
    Feng, Chunyan
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3348 - 3354
  • [34] Skeleton-based Action Recognition of People Handling Objects
    Kim, Sunoh
    Yun, Kimin
    Park, Jongyoul
    Choi, Jin Young
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 61 - 70
  • [35] Convolutional relation network for skeleton-based action recognition
    Zhu, Jiagang
    Zou, Wei
    Zhu, Zheng
    Hu, Yiming
    NEUROCOMPUTING, 2019, 370 : 109 - 117
  • [36] Memory Attention Networks for Skeleton-based Action Recognition
    Xie, Chunyu
    Li, Ce
    Zhang, Baochang
    Chen, Chen
    Han, Jungong
    Liu, Jianzhuang
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1639 - 1645
  • [37] SKELETON-BASED ACTION RECOGNITION USING LSTM AND CNN
    Li, Chuankun
    Wang, Pichao
    Wang, Shuang
    Hou, Yonghong
    Li, Wanqing
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2017,
  • [38] Pose Encoding for Robust Skeleton-Based Action Recognition
    Demisse, Girum G.
    Papadopoulos, Konstantinos
    Aouada, Djamila
    Ottersten, Bjorn
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 301 - 307
  • [39] Hypergraph Neural Network for Skeleton-Based Action Recognition
    Hao, Xiaoke
    Li, Jie
    Guo, Yingchun
    Jiang, Tao
    Yu, Ming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2263 - 2275
  • [40] Skeleton-based Action Recognition for Industrial Packing Process
    Chen, Zhenhui
    Hu, Haiyang
    Li, Zhongjin
    Qi, Xingchen
    Zhang, Haiping
    Hu, Hua
    Chang, Victor
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 36 - 45