Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

被引:17
|
作者
Shen, Xiaolong [1 ,2 ]
Yang, Zongxin [1 ]
Wang, Xiaohan [1 ]
Ma, Jianxin [2 ]
Zhou, Chang [2 ]
Yang, Yi [1 ]
机构
[1] Zhejiang Univ, CCAI, ReLER, Hangzhou, Zhejiang, Peoples R China
[2] Alibaba Grp, DAMO Acad, Hangzhou, Peoples R China
关键词
REPRESENTATION;
D O I
10.1109/CVPR52729.2023.00858
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness. Although these two metrics are responsible for different ranges of temporal consistency, existing state-of-the-art methods treat them as a unified problem and use monotonous modeling structures (e.g., RNN or attention-based block) to design their networks. However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details. To solve these problems, we propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT). First, a global transformer is introduced with a Masked Pose and Shape Estimation strategy for long-term modeling. The strategy stimulates the global transformer to learn more inter-frame correlations by randomly masking the features of several frames. Second, a local transformer is responsible for exploiting local details on the human mesh and interacting with the global transformer by leveraging cross-attention. Moreover, a Hierarchical Spatial Correlation Regressor is further introduced to refine intra-frame estimations by decoupled global-local representation and implicit kinematic constraints. Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M. Codes are available at https://github.com/sxl142/GLoT.
引用
收藏
页码:8887 / 8896
页数:10
相关论文
共 50 条
  • [41] Reducing Depth Ambiguity in 3D Human Pose and Body Shape Estimation
    Maruyama, Gakuto
    Kaneko, Naoshi
    Ito, Seiya
    Sumi, Kazuhiko
    FIFTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2021, 11794
  • [42] Sequential 3D Human Pose and Shape Estimation from Point Clouds
    Wang, Kangkan
    Xie, Jin
    Zhang, Guofeng
    Liu, Lei
    Yang, Jian
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 7273 - 7282
  • [43] 3D Human Body Shape and Pose Estimation from Depth Image
    Liu, Lei
    Wang, Kangkan
    Yang, Jian
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2020, 2020, 12305 : 410 - 421
  • [44] Personalized Graph Generation for Monocular 3D Human Pose and Shape Estimation
    Hu, Junxing
    Zhang, Hongwen
    Wang, Yunlong
    Ren, Min
    Sun, Zhenan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2399 - 2413
  • [45] Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms
    Pang, Hui En
    Cai, Zhongang
    Yang, Lei
    Zhang, Tianwei
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [46] 3D Human Pose Estimation based on Center of Gravity
    Xu, Liao
    Wu, Suping
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [47] LEAPSE: Learning Environment Affordances for 3D Human Pose and Shape Estimation
    Tian, Fangzheng
    Kim, Sungchan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 3285 - 3300
  • [48] GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video
    Yu, Bruce X. B.
    Zhang, Zhi
    Liu, Yongxu
    Zhong, Sheng-hua
    Liu, Yan
    Chen, Chang Wen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8784 - 8795
  • [49] Single-Image 3D Human Pose and Shape Estimation Enhanced by Clothed 3D Human Reconstruction
    Liu, Leyuan
    Gao, Yunqi
    Sun, Jianchi
    Chen, Jingying
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 33 - 44
  • [50] Adapted human pose: monocular 3D human pose estimation with zero real 3D pose data
    Liu, Shuangjun
    Sehgal, Naveen
    Ostadabbas, Sarah
    APPLIED INTELLIGENCE, 2022, 52 (12) : 14491 - 14506