Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

被引:2060
|
作者
Ionescu, Catalin [1 ,2 ]
Papava, Dragos [1 ]
Olaru, Vlad [1 ]
Sminchisescu, Cristian [3 ,4 ]
机构
[1] Romanian Acad IMAR, Inst Math, RO-010702 Bucharest, Romania
[2] Univ Bonn, Fac Math & Nat Sci, D-53115 Bonn, Germany
[3] Lund Univ, Fac Engn, Dept Math, SE-22100 Lund, Sweden
[4] Inst Math Romanian Acad, Riyadh, Saudi Arabia
关键词
3D human pose estimation; human motion capture data; articulated body modeling; optimization; large-scale learning; structured prediction; Fourier kernel approximations; HUMAN POSE; CAPTURE;
D O I
10.1109/TPAMI.2013.248
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models and algorithms. Besides increasing the size of the datasets in the current state-of-the-art by several orders of magnitude, we also aim to complement such datasets with a diverse set of motions and poses encountered as part of typical human activities (taking photos, talking on the phone, posing, greeting, eating, etc.), with additional synchronized image, human motion capture, and time of flight (depth) data, and with accurate 3D body scans of all the subject actors involved. We also provide controlled mixed reality evaluation scenarios where 3D human models are animated using motion capture and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide a set of large-scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. Our experiments show that our best large-scale model can leverage our full training set to obtain a 20% improvement in performance compared to a training set of the scale of the largest existing public dataset for this problem. Yet the potential for improvement by leveraging higher capacity, more complex models with our large dataset, is substantially vaster and should stimulate future research. The dataset together with code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, is available online at http://vision.imar.ro/human3.6m.
引用
收藏
页码:1325 / 1339
页数:15
相关论文
共 50 条
  • [1] H3WB: Human3.6M 3D WholeBody Dataset and Benchmark
    Zhu, Yue
    Samet, Nermin
    Picard, David
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20109 - 20120
  • [2] 3D recovery of human gaze in natural environments
    Paletta, Lucas
    Santner, Katrin
    Fritz, Gerald
    Mayer, Heinz
    INTELLIGENT ROBOTS AND COMPUTER VISION XXX: ALGORITHMS AND TECHNIQUES, 2013, 8662
  • [3] 3D human sensing
    Matsuyama, T.
    Nobuhara, S.
    Mukasa, T.
    Miyamoto, A.
    Fujimoto, K.
    INTERNATIONAL CONFERENCE ON INFORMATICS EDUCATION AND RESEARCH FOR KNOWLEDGE-CIRCULATING SOCIETY, PROCEEDINGS, 2008, : 41 - 46
  • [4] Towards intelligent environments: human sensing through 3D point cloud
    Yamaguchi, Hirozumi
    Rizk, Hamada
    Amano, Tatsuya
    Hiromori, Akihito
    Ukyo, Riki
    Yamada, Shota
    Ohno, Masakazu
    Journal of Reliable Intelligent Environments, 2024, 10 (03) : 281 - 298
  • [5] WildScenes: A benchmark for 2D and 3D semantic segmentation in large-scale natural environments
    Vidanapathirana, Kavisha
    Knights, Joshua
    Hausler, Stephen
    Cox, Mark
    Ramezani, Milad
    Jooste, Jason
    Griffiths, Ethan
    Mohamed, Shaheer
    Sridharan, Sridha
    Fookes, Clinton
    Moghadam, Peyman
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2025, 44 (04): : 532 - 549
  • [6] Large-scale probabilistic 3D organization of human chromosome territories
    Sehgal, Nitasha
    Fritz, Andrew J.
    Vecerova, Jaromira
    Ding, Hu
    Chen, Zihe
    Stojkovic, Branislav
    Bhattacharya, Sambit
    Xu, Jinhui
    Berezney, Ronald
    HUMAN MOLECULAR GENETICS, 2016, 25 (03) : 419 - 436
  • [7] NTU RGB plus D: A Large Scale Dataset for 3D Human Activity Analysis
    Shahroudy, Amir
    Liu, Jun
    Ng, Tian-Tsong
    Wang, Gang
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1010 - 1019
  • [8] A Modular Hybrid SLAM for the 3D Mapping of Large Scale Environments
    Le Cras, Jared
    Paxman, Jonathan
    2012 12TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS & VISION (ICARCV), 2012, : 1036 - 1041
  • [9] 3D Simplification Methods and Large Scale Terrain Tiling
    Campos, Ricard
    Quintana, Josep
    Garcia, Rafael
    Schmitt, Thierry
    Spoelstra, George
    Schaap, Dick M. A.
    REMOTE SENSING, 2020, 12 (03)
  • [10] Human-centric Scene Understanding for 3D Large-scale Scenarios
    Xu, Yiteng
    Cong, Peishan
    Yao, Yichen
    Chen, Runnan
    Hou, Yuenan
    Zhu, Xinge
    He, Xuming
    Yu, Jingyi
    Ma, Yuexin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20292 - 20302