Self-Supervised 3D Representation Learning of Dressed Humans From Social Media Videos

被引：0

作者：

Jafarian, Yasamin ^{[1
]}

Park, Hyun Soo ^{[1
]}

机构：

[1] Univ Minnesota, Minneapolis, MN 55455 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 07期

关键词：

Depth estimation; dataset; high fidelity human reconstruction; normal estimation; single view 3D reconstruction; self-supervised learning;

D O I：

10.1109/TPAMI.2022.3231558

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A key challenge of learning a visual representation for the 3D high fidelity geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D human reconstruction when applying to real-world imagery. We address this challenge by leveraging a new data resource: a number of social media dance videos that span diverse appearance, clothing styles, performances, and identities. Each video depicts dynamic movements of the body and clothes of a single person while lacking the 3D ground truth geometry. To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant. This allows self-supervision by enforcing a temporal coherence over the predictions. In addition, we jointly learn the depths along with the surface normals that are highly responsive to local texture, wrinkle, and shade by maximizing their geometric consistency. Our method is end-to-end trainable, resulting in high fidelity depth estimation that predicts fine geometry faithful to the input real image. We further provide a theoretical bound of self-supervised learning via an uncertainty analysis that characterizes the performance of the self-supervised learning without training. We demonstrate that our method outperforms the state-of-the-art human depth estimation and human shape recovery approaches on both real and rendered images.

引用

页码：8969 / 8983

页数：15

共 50 条

[1] Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
Wang, Xiaodong
Wu, Chenfei
Yin, Shengming
Ni, Minheng
Wang, Jianfeng
Li, Linjie
Yang, Zhengyuan
Yang, Fan
Wang, Lijuan
Liu, Zicheng
Fang, Yuejian
Duan, Nan
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1506 - 1514
[2] Self-Supervised 3D Action Representation Learning With Skeleton Cloud Colorization
Yang, Siyuan
Liu, Jun
Lu, Shijian
Hwa, Er Meng
Hu, Yongjian
Kot, Alex C.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 509 - 524
[3] Self-supervised Secondary Landmark Detection via 3D Representation Learning
Bala, Praneet
Zimmermann, Jan
Park, Hyun Soo
Hayden, Benjamin Y.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 1980 - 1994
[4] Self-supervised Secondary Landmark Detection via 3D Representation Learning
Praneet Bala
Jan Zimmermann
Hyun Soo Park
Benjamin Y. Hayden
International Journal of Computer Vision, 2023, 131 : 1980 - 1994
[5] Self-supervised Adversarial Masking for 3D Point Cloud Representation Learning
Szachniewicz, Michal
Kozlowski, Wojciech
Stypulkowski, Michal
Zieba, Maciej
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 156 - 168
[6] Modeling the Uncertainty for Self-supervised 3D Skeleton Action Representation Learning
Su, Yukun
Lin, Guosheng
Sun, Ruizhou
Hao, Yun
Wu, Qingyao
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 769 - 778
[7] Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
Qing, Zhiwu
Zhang, Shiwei
Huang, Ziyuan
Xu, Yi
Wang, Xiang
Tang, Mingqian
Gao, Changxin
Jin, Rong
Sang, Nong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13811 - 13821
[8] Self-supervised Representation Learning from Videos for Facial Action Unit Detection
Li, Yong
Zeng, Jiabei
Shan, Shiguang
Chen, Xilin
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10916 - 10925
[9] SCENE REPRESENTATION LEARNING FROM VIDEOS USING SELF-SUPERVISED AND WEAKLY-SUPERVISED TECHNIQUES
Peri, Raghuveer
Parthasarathy, Srinivas
Sundaram, Shiva
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1671 - 1675
[10] Self-Supervised 3D Behavior Representation Learning Based on Homotopic Hyperbolic Embedding
Chen, Jinghong
Jin, Zhihao
Wang, Qicong
Meng, Hongying
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6061 - 6074

← 1 2 3 4 5 →