GTPT: Group-Based Token Pruning Transformer for Efficient Human Pose Estimation

被引:0
|
作者
Wang, Haonan [1 ,2 ]
Liu, Jie [1 ]
Tang, Jie [1 ]
Wu, Gangshan [1 ]
Xu, Bo [2 ]
Chou, Yanbing [2 ]
Wang, Yong [2 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Cainiao Network, Hangzhou, Peoples R China
来源
关键词
Efficient human pose estimation; Whole-body pose estimation; Transformer; Token pruning; Group;
D O I
10.1007/978-3-031-72890-7_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and computational overhead. Efficient human pose estimation remains a hurdle, especially for whole-body pose estimation with numerous keypoints. While most current methods for efficient human pose estimation primarily rely on CNNs, we propose the Group-based Token Pruning Transformer (GTPT) that fully harnesses the advantages of the Transformer. GTPT alleviates the computational burden by gradually introducing keypoints in a coarse-to-fine manner. It minimizes the computation overhead while ensuring high performance. Besides, GTPT groups keypoint tokens and prunes visual tokens to improve model performance while reducing redundancy. We propose the Multi-Head Group Attention (MHGA) between different groups to achieve global interaction with little computational overhead. We conducted experiments on COCO and COCO-WholeBody. Compared to other methods, the experimental results show that GTPT can achieve higher performance with less computation, especially in whole-body with numerous keypoints.
引用
收藏
页码:213 / 230
页数:18
相关论文
共 50 条
  • [21] Towards infrared human pose estimation via Transformer
    Zhu, Zhilei
    Dong, Wanli
    Gao, Xiaoming
    Peng, Anjie
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [22] Automatic Group-Based Structured Pruning for Deep Convolutional Networks
    Wei, Hang
    Wang, Zulin
    Hua, Gengxin
    Sun, Jinjing
    Zhao, Yunfu
    IEEE ACCESS, 2022, 10 : 128824 - 128834
  • [23] UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation
    Li B.
    Tang S.
    Li W.
    Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 8345 - 8359
  • [24] EHGFormer: An efficient hypergraph-injected transformer for 3D human pose estimation
    Zheng, Siyuan
    Cao, Weiqun
    IMAGE AND VISION COMPUTING, 2025, 154
  • [25] Vision Transformer-based pilot pose estimation
    Wu, Honglan
    Liu, Hao
    Sun, Youchao
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (10): : 3100 - 3110
  • [26] Transformer-based weakly supervised 3D human pose estimation
    Wu, Xiao-guang
    Xie, Hu-jie
    Niu, Xiao-chen
    Wang, Chen
    Wang, Ze-lei
    Zhang, Shi-wen
    Shan, Yu-ze
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 109
  • [27] LOCAL TO GLOBAL TRANSFORMER FOR VIDEO BASED 3D HUMAN POSE ESTIMATION
    Ma, Haifeng
    Ke Lu
    Xue, Jian
    Niu, Zehai
    Gao, Pengcheng
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [28] Efficient Human Pose Estimation in Hierarchical Context
    Zhang, Feng
    Zhu, Xiatian
    Ye, Mao
    IEEE ACCESS, 2019, 7 : 29365 - 29373
  • [29] An Efficient Method for Boosting Human Pose Estimation
    Xiang, Shicheng
    Chen, Xiao
    Zhou, Jun
    2021 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING (BMSB), 2021,
  • [30] Epipolar Transformer for Multi-view Human Pose Estimation
    He, Yihui
    Yan, Rui
    Fragkiadaki, Katerina
    Yu, Shoou-, I
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4466 - 4471