Playing for 3D Human Recovery

被引：1

作者：

Cai, Zhongang ^{[1
,2
]}

Zhang, Mingyuan ^{[1
]}

Ren, Jiawei ^{[1
]}

Wei, Chen ^{[2
]}

Ren, Daxuan ^{[1
]}

Lin, Zhengyu ^{[2
]}

Zhao, Haiyu ^{[2
]}

Yang, Lei ^{[2
]}

Loy, Chen Change ^{[1
]}

Liu, Ziwei ^{[1
]}

机构：

[1] Nanyang Technol Univ, S Lab, Singapore 639798, Singapore

[2] Shanghai AI Lab, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Three-dimensional displays; Annotations; Synthetic data; Shape; Training; Parametric statistics; Solid modeling; Human pose and shape estimation; 3D human recovery; parametric humans; synthetic data; dataset;

D O I：

10.1109/TPAMI.2024.3450537

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image- and video-based 3D human recovery (i.e., pose and shape estimation) have achieved substantial progress. However, due to the prohibitive cost of motion capture, existing datasets are often limited in scale and diversity. In this work, we obtain massive human sequences by playing the video game with automatically annotated 3D ground truths. Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine, featuring a highly diverse set of subjects, actions, and scenarios. More importantly, we study the use of game-playing data and obtain five major insights. First, game-playing data is surprisingly effective. A simple frame-based baseline trained on GTA-Human outperforms more sophisticated methods by a large margin. For video-based methods, GTA-Human is even on par with the in-domain training set. Second, we discover that synthetic data provides critical complements to the real data that is typically collected indoor. We highlight that our investigation into domain gap provides explanations for our data mixture strategies that are simple yet useful, which offers new insights to the research community. Third, the scale of the dataset matters. The performance boost is closely related to the additional data available. A systematic study on multiple key factors (such as camera angle and body pose) reveals that the model performance is sensitive to data density. Fourth, the effectiveness of GTA-Human is also attributed to the rich collection of strong supervision labels (SMPL parameters), which are otherwise expensive to acquire in real datasets. Fifth, the benefits of synthetic data extend to larger models such as deeper convolutional neural networks (CNNs) and Transformers, for which a significant impact is also observed. We hope our work could pave the way for scaling up 3D human recovery to the real world.

引用

页码：10533 / 10545

页数：13

共 50 条

[11] Deformable Mesh Transformer for 3D Human Mesh Recovery
Yoshiyasu, Yusuke
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17006 - 17015
[12] META-LEARNED INITIALIZATION FOR 3D HUMAN RECOVERY
Kim, Mira
Min, Youngjo
Kim, Jiwon
Kim, Seungryong
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4238 - 4242
[13] Efficient 3D recovery of human motion in monocular video
Chen, Cheng
Xiao, Jun
Zhuang, Yueting
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2009, 21 (08): : 1118 - 1126
[14] Score-Guided Diffusion for 3D Human Recovery
Stathopoulos, Anastasis
Han, Ligong
Metaxas, Dimitris
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 906 - 915
[15] Recovery of 3D human posture from single view
Sun, Y
Hu, JS
Li, MH
Wang, EL
ELECTRONIC IMAGING AND MULTIMEDIA TECHNOLOGY III, 2002, 4925 : 229 - 237
[16] Learning by playing an edutainment 3D environment for schools
Di Blas, N
Paolini, P
Poggi, C
ED-MEDIA 2004: WORLD CONFERENCE ON EDUCATIONAL MULTIMEDIA, HYPERMEDIA & TELECOMMUNICATIONS, VOLS. 1-7, 2004, : 1313 - 1320
[17] Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild
Rong, Yu
Liu, Ziwei
Li, Cheng
Cao, Kaidi
Loy, Chen Change
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5339 - 5347
[18] A review of 3D human body pose estimation and mesh recovery
Muhammad, Zaka-Ud-Din
Huang, Zhangjin
Khan, Rashid
DIGITAL SIGNAL PROCESSING, 2022, 128
[19] Towards Statistically Provable Geometric 3D Human Pose Recovery
Wangni, Jianqiao
Lin, Dahua
Liu, Ji
Daniilidis, Kostas
Shi, Jianbo
SIAM JOURNAL ON IMAGING SCIENCES, 2021, 14 (01): : 246 - 270
[20] Recovery of 3D Human Posture based on Monocular Vision Technology
Zhang, Xiao
Huang, Qiang
2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 4, 2009, : 5 - 8

← 1 2 3 4 5 →