Self-Supervised 3D Representation Learning of Dressed Humans From Social Media Videos

被引:0
|
作者
Jafarian, Yasamin [1 ]
Park, Hyun Soo [1 ]
机构
[1] Univ Minnesota, Minneapolis, MN 55455 USA
关键词
Depth estimation; dataset; high fidelity human reconstruction; normal estimation; single view 3D reconstruction; self-supervised learning;
D O I
10.1109/TPAMI.2022.3231558
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key challenge of learning a visual representation for the 3D high fidelity geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D human reconstruction when applying to real-world imagery. We address this challenge by leveraging a new data resource: a number of social media dance videos that span diverse appearance, clothing styles, performances, and identities. Each video depicts dynamic movements of the body and clothes of a single person while lacking the 3D ground truth geometry. To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant. This allows self-supervision by enforcing a temporal coherence over the predictions. In addition, we jointly learn the depths along with the surface normals that are highly responsive to local texture, wrinkle, and shade by maximizing their geometric consistency. Our method is end-to-end trainable, resulting in high fidelity depth estimation that predicts fine geometry faithful to the input real image. We further provide a theoretical bound of self-supervised learning via an uncertainty analysis that characterizes the performance of the self-supervised learning without training. We demonstrate that our method outperforms the state-of-the-art human depth estimation and human shape recovery approaches on both real and rendered images.
引用
收藏
页码:8969 / 8983
页数:15
相关论文
共 50 条
  • [41] Self-Supervised Learning of Local Features in 3D Point Clouds
    Thabet, Ali
    Alwassel, Humam
    Ghanem, Bernard
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4048 - 4052
  • [42] Self-Supervised Representation Learning for Videos by Segmenting via Sampling Rate Order Prediction
    Huang, Jing
    Huang, Yan
    Wang, Qicong
    Yang, Wenming
    Meng, Hongying
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3475 - 3489
  • [43] Self-supervised representation learning for detection of ACL tear injury in knee MR videos
    Manna, Siladittya
    Bhattacharya, Saumik
    Pal, Umapada
    PATTERN RECOGNITION LETTERS, 2022, 154 : 37 - 43
  • [44] Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
    Chen, Brian
    Rouditchenko, Andrew
    Duarte, Kevin
    Kuehne, Hilde
    Thomas, Samuel
    Boggust, Angie
    Panda, Rameswar
    Kingsbury, Brian
    Feris, Rogerio
    Harwath, David
    Glass, James
    Picheny, Michael
    Chang, Shih-Fu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7992 - 8001
  • [45] Self-Supervised Learning from Untrimmed Videos via Hierarchical Consistency
    Qing, Zhiwu
    Zhang, Shiwei
    Huang, Ziyuan
    Xu, Yi
    Wang, Xiang
    Gao, Changxin
    Jin, Rong
    Sang, Nong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12408 - 12426
  • [46] Self-supervised Learning of Depth and Camera Motion from 360° Videos
    Wang, Fu-En
    Hu, Hou-Ning
    Cheng, Hsien-Tzu
    Lin, Juan-Ting
    Yang, Shang-Ta
    Shih, Meng-Li
    Chu, Hung-Kuo
    Sun, Min
    COMPUTER VISION - ACCV 2018, PT V, 2019, 11365 : 53 - 68
  • [47] Self-supervised Learning of Pose Embeddings from Spatiotemporal Relations in Videos
    Suemer, Oemer
    Dencker, Tobias
    Ommer, Bjoern
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4308 - 4317
  • [48] Depth Estimation for Colonoscopy Images with Self-supervised Learning from Videos
    Cheng, Kai
    Ma, Yiting
    Sun, Bin
    Li, Yang
    Chen, Xuejin
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VI, 2021, 12906 : 119 - 128
  • [49] NeRF-MAE: Masked AutoEncoders for Self-supervised 3D Representation Learning for Neural Radiance Fields
    Irshad, Muhammad Zubair
    Zakharov, Sergey
    Guizilini, Vitor
    Gaidon, Adrien
    Kira, Zsolt
    Ambrus, Rares
    COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 434 - 453
  • [50] Multi-Trusted Cross-Modal Information Bottleneck for 3D self-supervised representation learning
    Cheng, Haozhe
    Han, Xu
    Shi, Pengcheng
    Zhu, Jihua
    Li, Zhongyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 283