Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

被引：2060

作者：

Ionescu, Catalin ^{[1
,2
]}

Papava, Dragos ^{[1
]}

Olaru, Vlad ^{[1
]}

Sminchisescu, Cristian ^{[3
,4
]}

机构：

[1] Romanian Acad IMAR, Inst Math, RO-010702 Bucharest, Romania

[2] Univ Bonn, Fac Math & Nat Sci, D-53115 Bonn, Germany

[3] Lund Univ, Fac Engn, Dept Math, SE-22100 Lund, Sweden

[4] Inst Math Romanian Acad, Riyadh, Saudi Arabia

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2014年 / 36卷 / 07期

关键词：

3D human pose estimation; human motion capture data; articulated body modeling; optimization; large-scale learning; structured prediction; Fourier kernel approximations; HUMAN POSE; CAPTURE;

D O I：

10.1109/TPAMI.2013.248

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models and algorithms. Besides increasing the size of the datasets in the current state-of-the-art by several orders of magnitude, we also aim to complement such datasets with a diverse set of motions and poses encountered as part of typical human activities (taking photos, talking on the phone, posing, greeting, eating, etc.), with additional synchronized image, human motion capture, and time of flight (depth) data, and with accurate 3D body scans of all the subject actors involved. We also provide controlled mixed reality evaluation scenarios where 3D human models are animated using motion capture and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide a set of large-scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. Our experiments show that our best large-scale model can leverage our full training set to obtain a 20% improvement in performance compared to a training set of the scale of the largest existing public dataset for this problem. Yet the potential for improvement by leveraging higher capacity, more complex models with our large dataset, is substantially vaster and should stimulate future research. The dataset together with code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, is available online at http://vision.imar.ro/human3.6m.

引用

页码：1325 / 1339

页数：15

共 50 条

[21] PoseScript: 3D Human Poses from Natural Language
Delmas, Ginger
Weinzaepfel, Philippe
Lucas, Thomas
Moreno-Noguer, Francesc
Rogez, Gregory
COMPUTER VISION - ECCV 2022, PT VI, 2022, 13666 : 346 - 362
[22] PoseFix: Correcting 3D Human Poses with Natural Language
Delmas, Ginger
Weinzaepfel, Philippe
Moreno-Noguer, Francesc
Rogez, Gregory
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14972 - 14982
[23] Human knowledge acquisition from 3D interaction in virtual environments
Cheng Cheng
Jiang Ru
Dong XueMei
SCIENCE CHINA-INFORMATION SCIENCES, 2012, 55 (07) : 1528 - 1540
[24] On Prioritization Mechanisms for Large-Scale 3D Streaming in Distributed Virtual Environments
Jia, Jinyuan
Wang, Mingfei
Wang, Wei
Hei, Xiaojun
2016 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2016), 2016, : 465 - 472
[25] Efficient and multifidelity terrain modeling for 3D large-scale and unstructured environments
Liu, Xu
Li, Decai
He, Yuqing
Gu, Feng
JOURNAL OF FIELD ROBOTICS, 2022, 39 (08) : 1286 - 1322
[26] Recognition-Driven 3D Navigation in Large-Scale Virtual Environments
Guan, Wei
You, Suya
Neumann, Ulrich
2011 IEEE VIRTUAL REALITY CONFERENCE (VR), 2011, : 71 - 74
[27] NICP: Neural ICP for 3D Human Registration at Scale
Marin, Riccardo
Corona, Enric
Pons-Moll, Gerard
COMPUTER VISION - ECCV 2024, PT LVIII, 2025, 15116 : 265 - 285
[28] Embedded Processing and Compression of 3D Sensor Data for Large Scale Industrial Environments
Dybedal, Joacim
Aalerud, Atle
Hovland, Geir
SENSORS, 2019, 19 (03):
[29] Human knowledge acquisition from 3D interaction in virtual environments
CHENG Cheng
ScienceChina(InformationSciences), 2012, 55 (07) : 1528 - 1540
[30] Human knowledge acquisition from 3D interaction in virtual environments
Cheng Cheng
Ru Jiang
XueMei Dong
Science China Information Sciences, 2012, 55 : 1528 - 1540

← 1 2 3 4 5 →