Improving 2D Feature Representations by 3D-Aware Fine-Tuning

被引:0
|
作者
Yue, Yuanwen [1 ]
Das, Anurag [2 ]
Engelmann, Francis [1 ,3 ]
Tang, Siyu [1 ]
Lenssen, Jan Eric [2 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Max Planck Inst Informat, Saarland Informat Campus, Saarbrucken, Germany
[3] Google, Zurich, Switzerland
来源
COMPUTER VISION - ECCV 2024, PT II | 2025年 / 15060卷
关键词
Representation learning; Foundation models; Gaussian splatting; Scene understanding;
D O I
10.1007/978-3-031-72627-9_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current visual foundation models are trained purely on unstructured 2D data, limiting their understanding of 3D structure of objects and scenes. In this work, we show that fine-tuning on 3D-aware data improves the quality of emerging semantic features. We design a method to lift semantic 2D features into an efficient 3D Gaussian representation, which allows us to re-render them for arbitrary views. Using the rendered 3D-aware features, we design a fine-tuning strategy to transfer such 3D awareness into a 2D foundation model. We demonstrate that models fine-tuned in that way produce features that readily improve downstream task performance in semantic segmentation and depth estimation through simple linear probing. Notably, though fined-tuned on a single indoor dataset, the improvement is transferable to a variety of indoor datasets and out-of-domain datasets. We hope our study encourages the community to consider injecting 3D awareness when training 2D foundation models. Project page: https://ywyue.github.io/FiT3D.
引用
收藏
页码:57 / 74
页数:18
相关论文
共 50 条
  • [21] Stereoscopic video quality measurement with fine-tuning 3D ResNets
    Hassan Imani
    Md Baharul Islam
    Masum Shah Junayed
    Tarkan Aydin
    Nafiz Arica
    Multimedia Tools and Applications, 2022, 81 : 42849 - 42869
  • [22] Fine-tuning 3D foundation models for geometric object retrieval
    Van den Herrewegen, Jarne
    Tourwe, Tom
    Ovsjanikov, Maks
    Wyffels, Francis
    COMPUTERS & GRAPHICS-UK, 2024, 122
  • [23] Fine-tuning the functionality of reduced graphene oxide via bipolar electrochemistry in freestanding 2D reaction layers
    Beladi-Mousavi, Seyyed Mohsen
    Salinas, Gerardo
    Antonatos, Nikolas
    Mazanek, Vlastimil
    Garrigue, Patrick
    Sofer, Zdenek
    Kuhn, Alexander
    CARBON, 2022, 191 : 439 - 447
  • [24] Fine-tuning of relative metal-metal distances within highly ordered chiral 2D nanopatterns
    Zell, Philipp
    Moegele, Florian
    Ziener, Ulrich
    Rieger, Bernhard
    CHEMISTRY-A EUROPEAN JOURNAL, 2006, 12 (14) : 3847 - 3857
  • [25] Fine-Tuning 2D Heterogeneous Channels for Charge-Lock Enhanced Lithium Separation from Brine
    Hao, Yaxin
    Liu, Xin
    Zhang, Yaoling
    Zhang, Xin
    Li, Zhan
    Chen, Ximeng
    ADVANCED SCIENCE, 2024, 11 (41)
  • [26] Computing in stereochemistry - 2D or 3D representations?
    Pavlinic, S
    Buckley, P
    Davies, J
    Wright, T
    RESEARCH IN SCIENCE EDUCATION - PAST, PRESENT, AND FUTURE, 2001, : 295 - 300
  • [27] Expression Flow for 3D-Aware Face Component Transfer
    Yang, Fei
    Wang, Jue
    Shechtman, Eli
    Bourdev, Lubomir
    Metaxas, Dimitri
    ACM TRANSACTIONS ON GRAPHICS, 2011, 30 (04):
  • [28] 3D-Aware Indoor Scene Synthesis with Depth Priors
    Shi, Zifan
    Shen, Yujun
    Zhu, Jiapeng
    Yeung, Dit-Yan
    Chen, Qifeng
    COMPUTER VISION - ECCV 2022, PT XVI, 2022, 13676 : 406 - 422
  • [29] Tuning 2D
    David Pile
    Nature Photonics, 2017, 11 (2) : 72 - 72
  • [30] Interactive Room Capture on 3D-Aware Mobile Devices
    Sankar, Aditya
    Seitz, Steven M.
    UIST'17: PROCEEDINGS OF THE 30TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2017, : 415 - 426