Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

被引：17

作者：

Shen, Xiaolong ^{[1
,2
]}

Yang, Zongxin ^{[1
]}

Wang, Xiaohan ^{[1
]}

Ma, Jianxin ^{[2
]}

Zhou, Chang ^{[2
]}

Yang, Yi ^{[1
]}

机构：

[1] Zhejiang Univ, CCAI, ReLER, Hangzhou, Zhejiang, Peoples R China

[2] Alibaba Grp, DAMO Acad, Hangzhou, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

REPRESENTATION;

D O I：

10.1109/CVPR52729.2023.00858

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness. Although these two metrics are responsible for different ranges of temporal consistency, existing state-of-the-art methods treat them as a unified problem and use monotonous modeling structures (e.g., RNN or attention-based block) to design their networks. However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details. To solve these problems, we propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT). First, a global transformer is introduced with a Masked Pose and Shape Estimation strategy for long-term modeling. The strategy stimulates the global transformer to learn more inter-frame correlations by randomly masking the features of several frames. Second, a local transformer is responsible for exploiting local details on the human mesh and interacting with the global transformer by leveraging cross-attention. Moreover, a Hierarchical Spatial Correlation Regressor is further introduced to refine intra-frame estimations by decoupled global-local representation and implicit kinematic constraints. Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M. Codes are available at https://github.com/sxl142/GLoT.

引用

页码：8887 / 8896

页数：10

共 50 条

[41] Reducing Depth Ambiguity in 3D Human Pose and Body Shape Estimation
Maruyama, Gakuto
Kaneko, Naoshi
Ito, Seiya
Sumi, Kazuhiko
FIFTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2021, 11794
[42] Sequential 3D Human Pose and Shape Estimation from Point Clouds
Wang, Kangkan
Xie, Jin
Zhang, Guofeng
Liu, Lei
Yang, Jian
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 7273 - 7282
[43] 3D Human Body Shape and Pose Estimation from Depth Image
Liu, Lei
Wang, Kangkan
Yang, Jian
PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2020, 2020, 12305 : 410 - 421
[44] Personalized Graph Generation for Monocular 3D Human Pose and Shape Estimation
Hu, Junxing
Zhang, Hongwen
Wang, Yunlong
Ren, Min
Sun, Zhenan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2399 - 2413
[45] Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms
Pang, Hui En
Cai, Zhongang
Yang, Lei
Zhang, Tianwei
Liu, Ziwei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[46] 3D Human Pose Estimation based on Center of Gravity
Xu, Liao
Wu, Suping
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[47] LEAPSE: Learning Environment Affordances for 3D Human Pose and Shape Estimation
Tian, Fangzheng
Kim, Sungchan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 3285 - 3300
[48] GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video
Yu, Bruce X. B.
Zhang, Zhi
Liu, Yongxu
Zhong, Sheng-hua
Liu, Yan
Chen, Chang Wen
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8784 - 8795
[49] Single-Image 3D Human Pose and Shape Estimation Enhanced by Clothed 3D Human Reconstruction
Liu, Leyuan
Gao, Yunqi
Sun, Jianchi
Chen, Jingying
ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 33 - 44
[50] Adapted human pose: monocular 3D human pose estimation with zero real 3D pose data
Liu, Shuangjun
Sehgal, Naveen
Ostadabbas, Sarah
APPLIED INTELLIGENCE, 2022, 52 (12) : 14491 - 14506

← 1 2 3 4 5 →