Self-Supervised 3D Representation Learning of Dressed Humans From Social Media Videos

被引:0
|
作者
Jafarian, Yasamin [1 ]
Park, Hyun Soo [1 ]
机构
[1] Univ Minnesota, Minneapolis, MN 55455 USA
关键词
Depth estimation; dataset; high fidelity human reconstruction; normal estimation; single view 3D reconstruction; self-supervised learning;
D O I
10.1109/TPAMI.2022.3231558
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key challenge of learning a visual representation for the 3D high fidelity geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D human reconstruction when applying to real-world imagery. We address this challenge by leveraging a new data resource: a number of social media dance videos that span diverse appearance, clothing styles, performances, and identities. Each video depicts dynamic movements of the body and clothes of a single person while lacking the 3D ground truth geometry. To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant. This allows self-supervision by enforcing a temporal coherence over the predictions. In addition, we jointly learn the depths along with the surface normals that are highly responsive to local texture, wrinkle, and shade by maximizing their geometric consistency. Our method is end-to-end trainable, resulting in high fidelity depth estimation that predicts fine geometry faithful to the input real image. We further provide a theoretical bound of self-supervised learning via an uncertainty analysis that characterizes the performance of the self-supervised learning without training. We demonstrate that our method outperforms the state-of-the-art human depth estimation and human shape recovery approaches on both real and rendered images.
引用
收藏
页码:8969 / 8983
页数:15
相关论文
共 50 条
  • [1] Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
    Wang, Xiaodong
    Wu, Chenfei
    Yin, Shengming
    Ni, Minheng
    Wang, Jianfeng
    Li, Linjie
    Yang, Zhengyuan
    Yang, Fan
    Wang, Lijuan
    Liu, Zicheng
    Fang, Yuejian
    Duan, Nan
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1506 - 1514
  • [2] Self-Supervised 3D Action Representation Learning With Skeleton Cloud Colorization
    Yang, Siyuan
    Liu, Jun
    Lu, Shijian
    Hwa, Er Meng
    Hu, Yongjian
    Kot, Alex C.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 509 - 524
  • [3] Self-supervised Secondary Landmark Detection via 3D Representation Learning
    Bala, Praneet
    Zimmermann, Jan
    Park, Hyun Soo
    Hayden, Benjamin Y.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 1980 - 1994
  • [4] Self-supervised Secondary Landmark Detection via 3D Representation Learning
    Praneet Bala
    Jan Zimmermann
    Hyun Soo Park
    Benjamin Y. Hayden
    International Journal of Computer Vision, 2023, 131 : 1980 - 1994
  • [5] Self-supervised Adversarial Masking for 3D Point Cloud Representation Learning
    Szachniewicz, Michal
    Kozlowski, Wojciech
    Stypulkowski, Michal
    Zieba, Maciej
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 156 - 168
  • [6] Modeling the Uncertainty for Self-supervised 3D Skeleton Action Representation Learning
    Su, Yukun
    Lin, Guosheng
    Sun, Ruizhou
    Hao, Yun
    Wu, Qingyao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 769 - 778
  • [7] Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
    Qing, Zhiwu
    Zhang, Shiwei
    Huang, Ziyuan
    Xu, Yi
    Wang, Xiang
    Tang, Mingqian
    Gao, Changxin
    Jin, Rong
    Sang, Nong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13811 - 13821
  • [8] Self-supervised Representation Learning from Videos for Facial Action Unit Detection
    Li, Yong
    Zeng, Jiabei
    Shan, Shiguang
    Chen, Xilin
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10916 - 10925
  • [9] SCENE REPRESENTATION LEARNING FROM VIDEOS USING SELF-SUPERVISED AND WEAKLY-SUPERVISED TECHNIQUES
    Peri, Raghuveer
    Parthasarathy, Srinivas
    Sundaram, Shiva
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1671 - 1675
  • [10] Self-Supervised 3D Behavior Representation Learning Based on Homotopic Hyperbolic Embedding
    Chen, Jinghong
    Jin, Zhihao
    Wang, Qicong
    Meng, Hongying
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6061 - 6074