GTPT: Group-Based Token Pruning Transformer for Efficient Human Pose Estimation

被引：0

作者：

Wang, Haonan ^{[1
,2
]}

Liu, Jie ^{[1
]}

Tang, Jie ^{[1
]}

Wu, Gangshan ^{[1
]}

Xu, Bo ^{[2
]}

Chou, Yanbing ^{[2
]}

Wang, Yong ^{[2
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Cainiao Network, Hangzhou, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LXIX | 2025年 / 15127卷

关键词：

Efficient human pose estimation; Whole-body pose estimation; Transformer; Token pruning; Group;

D O I：

10.1007/978-3-031-72890-7_13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and computational overhead. Efficient human pose estimation remains a hurdle, especially for whole-body pose estimation with numerous keypoints. While most current methods for efficient human pose estimation primarily rely on CNNs, we propose the Group-based Token Pruning Transformer (GTPT) that fully harnesses the advantages of the Transformer. GTPT alleviates the computational burden by gradually introducing keypoints in a coarse-to-fine manner. It minimizes the computation overhead while ensuring high performance. Besides, GTPT groups keypoint tokens and prunes visual tokens to improve model performance while reducing redundancy. We propose the Multi-Head Group Attention (MHGA) between different groups to achieve global interaction with little computational overhead. We conducted experiments on COCO and COCO-WholeBody. Compared to other methods, the experimental results show that GTPT can achieve higher performance with less computation, especially in whole-body with numerous keypoints.

引用

页码：213 / 230

页数：18

共 50 条

[31] Human Pose Estimation Combined with Transformer for Spatiotemporal Representation Learning
Qiu, Feiyue
Peng, Delong
Sun, Lin
Zhou, Jian
2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 1284 - 1288
[32] ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
Xu, Yufei
Zhang, Jing
Zhang, Qiming
Tao, Dacheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[33] CSIT: Channel Spatial Integrated Transformer for human pose estimation
Li, Shaohua
Zhang, Haixiang
Ma, Hanjie
Feng, Jie
Jiang, Mingfeng
IET IMAGE PROCESSING, 2023, 17 (10) : 3002 - 3011
[34] Test-Time Personalization with a Transformer for Human Pose Estimation
Li, Yizhuo
Hao, Miao
Di, Zonglin
Gundavarapu, Nitesh B.
Wang, Xiaolong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[35] Shift Pose: A Lightweight Transformer-like Neural Network for Human Pose Estimation
Chen, Haijian
Jiang, Xinyun
Dai, Yonghui
SENSORS, 2022, 22 (19)
[36] Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation
Zhou, Kangkang
Zhang, Lijun
Lu, Feng
Zhou, Xiang-Dong
Shi, Yu
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7512 - 7520
[37] Efficient Human Pose Estimation via Parsing a Tree Structure Based Human Model
Zhang, Xiaoqin
Li, Changcheng
Tong, Xiaofeng
Hu, Weiming
Maybank, Steve
Zhang, Yimin
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 1349 - 1356
[38] A Local-Global Estimator Based on Large Kernel CNN and Transformer for Human Pose Estimation and Running Pose Measurement
Wu, Qingtian
Wu, Yongfei
Zhang, Yu
Zhang, Liming
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[39] An Efficient Group-Based Secret Sharing Scheme
Lv, Chunli
Jia, Xiaoqi
Lin, Jingqiang
Jing, Jiwu
Tian, Lijun
INFORMATION SECURITY PRACTICE AND EXPERIENCE, 2011, 6672 : 288 - 301
[40] Efficient Contour Computation of Group-Based Skyline
Yu, Wenhui
Liu, Jinfei
Pei, Jian
Xiong, Li
Chen, Xu
Qin, Zheng
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (07) : 1317 - 1332

← 1 2 3 4 5 →