Playing for 3D Human Recovery

被引:1
|
作者
Cai, Zhongang [1 ,2 ]
Zhang, Mingyuan [1 ]
Ren, Jiawei [1 ]
Wei, Chen [2 ]
Ren, Daxuan [1 ]
Lin, Zhengyu [2 ]
Zhao, Haiyu [2 ]
Yang, Lei [2 ]
Loy, Chen Change [1 ]
Liu, Ziwei [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore 639798, Singapore
[2] Shanghai AI Lab, Shanghai 200240, Peoples R China
关键词
Three-dimensional displays; Annotations; Synthetic data; Shape; Training; Parametric statistics; Solid modeling; Human pose and shape estimation; 3D human recovery; parametric humans; synthetic data; dataset;
D O I
10.1109/TPAMI.2024.3450537
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image- and video-based 3D human recovery (i.e., pose and shape estimation) have achieved substantial progress. However, due to the prohibitive cost of motion capture, existing datasets are often limited in scale and diversity. In this work, we obtain massive human sequences by playing the video game with automatically annotated 3D ground truths. Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine, featuring a highly diverse set of subjects, actions, and scenarios. More importantly, we study the use of game-playing data and obtain five major insights. First, game-playing data is surprisingly effective. A simple frame-based baseline trained on GTA-Human outperforms more sophisticated methods by a large margin. For video-based methods, GTA-Human is even on par with the in-domain training set. Second, we discover that synthetic data provides critical complements to the real data that is typically collected indoor. We highlight that our investigation into domain gap provides explanations for our data mixture strategies that are simple yet useful, which offers new insights to the research community. Third, the scale of the dataset matters. The performance boost is closely related to the additional data available. A systematic study on multiple key factors (such as camera angle and body pose) reveals that the model performance is sensitive to data density. Fourth, the effectiveness of GTA-Human is also attributed to the rich collection of strong supervision labels (SMPL parameters), which are otherwise expensive to acquire in real datasets. Fifth, the benefits of synthetic data extend to larger models such as deeper convolutional neural networks (CNNs) and Transformers, for which a significant impact is also observed. We hope our work could pave the way for scaling up 3D human recovery to the real world.
引用
收藏
页码:10533 / 10545
页数:13
相关论文
共 50 条
  • [21] 3D Human Mesh Recovery with Sequentially Global Rotation Estimation
    Wang, Dongkai
    Zhang, Shiliang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14907 - 14916
  • [22] Filling the Joints: Completion and Recovery of Incomplete 3D Human Poses
    Bautembach, Dennis
    Oikonomidis, Iason
    Argyros, Antonis
    TECHNOLOGIES, 2018, 6 (04)
  • [23] Deep learning for 3D human pose estimation and mesh recovery: A survey
    Liu, Yang
    Qiu, Changzhen
    Zhang, Zhiyong
    NEUROCOMPUTING, 2024, 596
  • [24] Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views
    Zhang, Siwei
    Ma, Qianli
    Zhang, Yan
    Aliakbarian, Sadegh
    Cosker, Darren
    Tang, Siyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 7955 - 7966
  • [25] An Effective and Efficient Approach for 3D Recovery of Human Motion Capture Data
    Yasin, Hashim
    Ghani, Saba
    Krueger, Bjorn
    SENSORS, 2023, 23 (07)
  • [26] Visual Recovery of Saliency Maps from Human Attention in 3D Environments
    Santner, Katrin
    Fritz, Gerald
    Paletta, Lucas
    Mayer, Heinz
    2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 4297 - 4303
  • [27] A Progressive Quadric Graph Convolutional Network for 3D Human Mesh Recovery
    Wang, Lei
    Liu, Xunyu
    Ma, Xiaoliang
    Wu, Jiaji
    Cheng, Jun
    Zhou, Mengchu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (01) : 104 - 117
  • [28] Playing a 3D Game of Life in an interactive virtual sandbox
    Ogihara, D
    Sayama, H
    ADVANCES IN ARTIFICAL LIFE, PROCEEDINGS, 2005, 3630 : 481 - 490
  • [29] 3D sparse signal recovery via 3D orthogonal matching pursuit
    Huo, Yingqiu
    Fang, Yong
    Huang, Lei
    JOURNAL OF SYSTEMS ARCHITECTURE, 2016, 64 : 3 - 10
  • [30] AFFECTS OF ILLUMINATION ON 3D SHAPE RECOVERY
    Mannan, S. M.
    Malik, Aamir S.
    Choi, Tae-Sun
    2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 2008, : 1496 - 1499