Split-and-recombine and vision transformer based 3D human pose estimation

被引:0
|
作者
Lu, Xinyi [1 ]
Xu, Fan [1 ]
Hu, Shuiyi [2 ]
Yu, Tianqi [1 ]
Hu, Jianling [1 ,3 ]
机构
[1] Soochow Univ, Suzhou, Jiangsu, Peoples R China
[2] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Guangdong, Peoples R China
[3] Wuxi Univ, Wuxi, Jiangsu, Peoples R China
关键词
3D human pose estimation; Split-and-recombine; Visual transformer; Self-attention mechanism;
D O I
10.1007/s11760-024-03670-8
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Regression of 3D human pose from monocular images faces many challenges, especially for rare poses and occlusions. To solve these problems, we propose SR-ViT, a novel approach based on Split-and-Recombine and Visual Transformer for 3D human pose estimation. Our method first feeds the 2D joint coordinates of multi-frame images into the 3D feature extractor to obtain the 3D features of each frame. After feature fusion with the position embedding information, the global correlation between all frames is modeled by the Transformer encoder, and the final 3D pose output is obtained with a regression head, which achieves the estimation of the 3D pose of the center frame from consecutive multi-frame images and effectively solves the joint occlusion problem. By improving the structure of the 3D feature extractor and the design of the loss function, the prediction performance of rare poses is improved. The model performance is also enhanced by improving the self-attention mechanism in both global and local aspects. Our method has been evaluated on two benchmark datasets, namely, Human3.6M and MPI-INF-3DHP. Experimental results show that our method outperforms the benchmark methods on both datasets.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Overview of 3D Human Pose Estimation
    Lin, Jianchu
    Li, Shuang
    Qin, Hong
    Wang, Hongchang
    Cui, Ning
    Jiang, Qian
    Jian, Haifang
    Wang, Gongming
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 134 (03): : 1621 - 1651
  • [32] SlowFastFormer for 3D human pose estimation
    Zhou, Lu
    Chen, Yingying
    Wang, Jinqiao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 243
  • [33] Demo Abstract: Vision-aided 3D Human Pose Estimation with RFID
    Yang, Chao
    Wang, Xuyu
    Mao, Shiwen
    2020 16TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING (MSN 2020), 2020, : 628 - 629
  • [34] Pose ResNet: 3D Human Pose Estimation Based on Self-Supervision
    Bao, Wenxia
    Ma, Zhongyu
    Liang, Dong
    Yang, Xianjun
    Niu, Tao
    SENSORS, 2023, 23 (06)
  • [35] A multi-granular joint tracing transformer for video-based 3D human pose estimation
    Hou, Yingying
    Huang, Zhenhua
    Zhu, Wentao
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [36] Vision Transformer-based pilot pose estimation
    Wu, Honglan
    Liu, Hao
    Sun, Youchao
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (10): : 3100 - 3110
  • [37] 3D Human Pose Estimation=2D Pose Estimation plus Matching
    Chen, Ching-Hang
    Ramanan, Deva
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5759 - 5767
  • [38] 3D Human Pose Estimation Based on Volumetric Joint Coordinates
    Wan Y.
    Song Y.
    Liu L.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (09): : 1411 - 1419
  • [39] Video-Based 3D Human Pose Estimation Research
    Tao, Siting
    Zhang, Zhi
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 485 - 490
  • [40] HOGFormer: high-order graph convolution transformer for 3D human pose estimation
    Xie, Yuhong
    Hong, Chaoqun
    Zhuang, Weiwei
    Liu, Lijuan
    Li, Jie
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (01) : 599 - 610