Skeleton MixFormer: Multivariate Topology Representation for Skeleton-based Action Recognition

被引：15

作者：

Xin, Wentian ^{[1
]}

Miao, Qiguang ^{[1
]}

Liu, Yi ^{[1
]}

Liu, Ruyi ^{[1
]}

Pun, Chi-Man ^{[2
]}

Shi, Cheng ^{[3
]}

机构：

[1] Xidian Univ, Xian, Peoples R China

[2] Univ Macau, Macau, Peoples R China

[3] Xian Univ Technol, Xian, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

国家重点研发计划;

关键词：

video understanding; skeleton action recognition; topology representation; transformer; attention;

D O I：

10.1145/3581783.3611900

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision Transformer, which performs well in various vision tasks, encounters a bottleneck in skeleton-based action recognition and falls short of advanced GCN-based methods. The root cause is that the current skeleton transformer depends on the self-attention mechanism of the complete channel of the global joint, ignoring the highly discriminative differential correlation within the channel, so it is challenging to learn the expression of the multivariate topology dynamically. To tackle this, we present Skeleton MixFormer, an innovative spatio-temporal architecture to effectively represent the physical correlations and temporal interactivity of the compact skeleton data. Two essential components make up the proposed framework: 1) Spatial MixFormer. The channel-grouping and mix-attention are utilized to calculate the dynamic multivariate topological relationships. Compared with the full-channel self-attention method, Spatial MixFormer better highlights the channel groups' discriminative differences and the joint adjacency's interpretable learning. 2) Temporal MixFormer, which consists of Multiscale Convolution, Temporal Transformer and Sequential Holding Module. The multivariate temporal models ensure the richness of global difference expression and realize the discrimination of crucial intervals in the sequence, thereby enabling more effective learning of long and short-term dependencies in actions. Our Skeleton Mix-Former demonstrates state-of-the-art (SOTA) performance across seven different settings on four standard datasets, namely NTU-60, NTU-120, NW-UCLA, and UAV-Human. Related code will be available on Skeleton-MixFormer.

引用

页码：2211 / 2220

页数：10

共 50 条

[31] Temporal Extension Module for Skeleton-Based Action Recognition
Obinata, Yuya
Yamamoto, Takuma
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 534 - 540
[32] Adversarial Attack on Skeleton-Based Human Action Recognition
Liu, Jian
Akhtar, Naveed
Mian, Ajmal
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1609 - 1622
[33] Skeleton-based Action Recognition with Graph Involution Network
Tang, Zhihao
Xia, Hailun
Gao, Xinkai
Gao, Feng
Feng, Chunyan
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3348 - 3354
[34] Skeleton-based Action Recognition of People Handling Objects
Kim, Sunoh
Yun, Kimin
Park, Jongyoul
Choi, Jin Young
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 61 - 70
[35] Convolutional relation network for skeleton-based action recognition
Zhu, Jiagang
Zou, Wei
Zhu, Zheng
Hu, Yiming
NEUROCOMPUTING, 2019, 370 : 109 - 117
[36] Memory Attention Networks for Skeleton-based Action Recognition
Xie, Chunyu
Li, Ce
Zhang, Baochang
Chen, Chen
Han, Jungong
Liu, Jianzhuang
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1639 - 1645
[37] SKELETON-BASED ACTION RECOGNITION USING LSTM AND CNN
Li, Chuankun
Wang, Pichao
Wang, Shuang
Hou, Yonghong
Li, Wanqing
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2017,
[38] Pose Encoding for Robust Skeleton-Based Action Recognition
Demisse, Girum G.
Papadopoulos, Konstantinos
Aouada, Djamila
Ottersten, Bjorn
PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 301 - 307
[39] Hypergraph Neural Network for Skeleton-Based Action Recognition
Hao, Xiaoke
Li, Jie
Guo, Yingchun
Jiang, Tao
Yu, Ming
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2263 - 2275
[40] Skeleton-based Action Recognition for Industrial Packing Process
Chen, Zhenhui
Hu, Haiyang
Li, Zhongjin
Qi, Xingchen
Zhang, Haiping
Hu, Hua
Chang, Victor
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 36 - 45

← 1 2 3 4 5 →