Split-and-recombine and vision transformer based 3D human pose estimation

被引:0
|
作者
Lu, Xinyi [1 ]
Xu, Fan [1 ]
Hu, Shuiyi [2 ]
Yu, Tianqi [1 ]
Hu, Jianling [1 ,3 ]
机构
[1] Soochow Univ, Suzhou, Jiangsu, Peoples R China
[2] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Guangdong, Peoples R China
[3] Wuxi Univ, Wuxi, Jiangsu, Peoples R China
关键词
3D human pose estimation; Split-and-recombine; Visual transformer; Self-attention mechanism;
D O I
10.1007/s11760-024-03670-8
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Regression of 3D human pose from monocular images faces many challenges, especially for rare poses and occlusions. To solve these problems, we propose SR-ViT, a novel approach based on Split-and-Recombine and Visual Transformer for 3D human pose estimation. Our method first feeds the 2D joint coordinates of multi-frame images into the 3D feature extractor to obtain the 3D features of each frame. After feature fusion with the position embedding information, the global correlation between all frames is modeled by the Transformer encoder, and the final 3D pose output is obtained with a regression head, which achieves the estimation of the 3D pose of the center frame from consecutive multi-frame images and effectively solves the joint occlusion problem. By improving the structure of the 3D feature extractor and the design of the loss function, the prediction performance of rare poses is improved. The model performance is also enhanced by improving the self-attention mechanism in both global and local aspects. Our method has been evaluated on two benchmark datasets, namely, Human3.6M and MPI-INF-3DHP. Experimental results show that our method outperforms the benchmark methods on both datasets.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] LOCAL TO GLOBAL TRANSFORMER FOR VIDEO BASED 3D HUMAN POSE ESTIMATION
    Ma, Haifeng
    Ke Lu
    Xue, Jian
    Niu, Zehai
    Gao, Pengcheng
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [2] Transformer-based weakly supervised 3D human pose estimation
    Wu, Xiao-guang
    Xie, Hu-jie
    Niu, Xiao-chen
    Wang, Chen
    Wang, Ze-lei
    Zhang, Shi-wen
    Shan, Yu-ze
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 109
  • [3] Mixing Performance of a Cost-effective Split-and-Recombine 3D Micromixer Fabricated by Xurographic Method
    Taheri, Ramezan Ali
    Goodarzi, Vahabodin
    Allahverdi, Abdollah
    MICROMACHINES, 2019, 10 (11)
  • [4] RETRACTED: 3D Human Pose Estimation Based on Transformer Algorithm (Retracted Article)
    Chen, Guowei
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [5] Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
    Li, Wenhao
    Liu, Mengyuan
    Liu, Hong
    Wang, Pichao
    Cai, Jialun
    Sebe, Nicu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 604 - 613
  • [6] Transformer-based 3D Human pose estimation and action achievement evaluation
    Yang, Aolei
    Zhou, Yinghong
    Yang, Banghua
    Xu, Yulin
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2024, 45 (04): : 136 - 144
  • [7] Dual-Path Transformer for 3D Human Pose Estimation
    Zhou, Lu
    Chen, Yingying
    Wang, Jinqiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3260 - 3270
  • [8] DGFormer: Dynamic graph transformer for 3D human pose estimation
    Chen, Zhangmeng
    Dai, Ju
    Bai, Junxuan
    Pan, Junjun
    PATTERN RECOGNITION, 2024, 152
  • [9] End-to-end 3D Human Pose Estimation with Transformer
    Zhang, Bowei
    Cui, Peng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4529 - 4536
  • [10] 3D Human Pose Estimation in Video with Temporal and Spatial Transformer
    Peng, Sha
    Hu, Jiwei
    Proceedings of SPIE - The International Society for Optical Engineering, 2023, 12707