Split-and-recombine and vision transformer based 3D human pose estimation

被引:0
|
作者
Lu, Xinyi [1 ]
Xu, Fan [1 ]
Hu, Shuiyi [2 ]
Yu, Tianqi [1 ]
Hu, Jianling [1 ,3 ]
机构
[1] Soochow Univ, Suzhou, Jiangsu, Peoples R China
[2] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Guangdong, Peoples R China
[3] Wuxi Univ, Wuxi, Jiangsu, Peoples R China
关键词
3D human pose estimation; Split-and-recombine; Visual transformer; Self-attention mechanism;
D O I
10.1007/s11760-024-03670-8
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Regression of 3D human pose from monocular images faces many challenges, especially for rare poses and occlusions. To solve these problems, we propose SR-ViT, a novel approach based on Split-and-Recombine and Visual Transformer for 3D human pose estimation. Our method first feeds the 2D joint coordinates of multi-frame images into the 3D feature extractor to obtain the 3D features of each frame. After feature fusion with the position embedding information, the global correlation between all frames is modeled by the Transformer encoder, and the final 3D pose output is obtained with a regression head, which achieves the estimation of the 3D pose of the center frame from consecutive multi-frame images and effectively solves the joint occlusion problem. By improving the structure of the 3D feature extractor and the design of the loss function, the prediction performance of rare poses is improved. The model performance is also enhanced by improving the self-attention mechanism in both global and local aspects. Our method has been evaluated on two benchmark datasets, namely, Human3.6M and MPI-INF-3DHP. Experimental results show that our method outperforms the benchmark methods on both datasets.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Adaptive Multi-View and Temporal Fusing Transformer for 3D Human Pose Estimation
    Shuai, Hui
    Wu, Lele
    Liu, Qingshan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4122 - 4135
  • [42] U-shaped spatial–temporal transformer network for 3D human pose estimation
    Honghong Yang
    Longfei Guo
    Yumei Zhang
    Xiaojun Wu
    Machine Vision and Applications, 2022, 33
  • [43] Multi-scale spatial-temporal transformer for 3D human pose estimation
    Wu, Yongpeng
    Gao, Junna
    2021 5TH INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING (ICVISP 2021), 2021, : 242 - 247
  • [44] Enhancing 3D Human Pose Estimation Amidst Severe Occlusion With Dual Transformer Fusion
    Ghafoor, Mehwish
    Mahmood, Arif
    Bilal, Muhammad
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1617 - 1624
  • [45] ICRFormer: An Improving Cos-Reweighting Transformer for 3D Human Pose Estimation in Video
    Zhang, Kaixu
    Luan, Xiaoming
    Syed, Tafseer Haider Shah
    Xiang, Xuezhi
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 436 - 441
  • [46] Deep Semantic Graph Transformer for Multi-View 3D Human Pose Estimation
    Zhang, Lijun
    Zhou, Kangkang
    Lu, Feng
    Zhou, Xiang-Dong
    Shi, Yu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7205 - 7214
  • [47] Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation
    Zhou, Kangkang
    Zhang, Lijun
    Lu, Feng
    Zhou, Xiang-Dong
    Shi, Yu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7512 - 7520
  • [48] GraFormer: Graph-oriented Transformer for 3D Pose Estimation
    Zhao, Weixi
    Wang, Weiqiang
    Tian, Yunjie
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20406 - 20415
  • [49] 3D pose estimation of large and complicated workpieces based on binocular stereo vision
    Luo, Zhifeng
    Zhang, Ke
    Wang, Zhigang
    Zheng, Jian
    Chen, Yixin
    APPLIED OPTICS, 2017, 56 (24) : 6822 - 6836
  • [50] A NOVEL PASSIVE MICROMIXER BASED ON ASYMMETRIC SPLIT-AND-RECOMBINE WITH FAN-SHAPED CAVITY
    Xia, Guodong
    Li, Jian
    Wu, Hongjie
    Zhou, Mingzheng
    Wang, Haiyan
    PROCEEDINGS OF THE ASME 9TH INTERNATIONAL CONFERENCE ON NANOCHANNELS, MICROCHANNELS AND MINICHANNELS 2011, VOL 2, 2012, : 135 - 141