Fine-tuning 3D foundation models for geometric object retrieval

被引:0
|
作者
Van den Herrewegen, Jarne [1 ,2 ]
Tourwe, Tom [1 ]
Ovsjanikov, Maks [3 ]
Wyffels, Francis [2 ]
机构
[1] Oqton AI, Edegem, Belgium
[2] Ghent Univ Imec, AI & Robot Lab, IDLab AIRO, Zwijnaarde, Belgium
[3] Ecole Polytech, LIX, Palaiseau, France
来源
COMPUTERS & GRAPHICS-UK | 2024年 / 122卷
关键词
Object retrieval; Deep learning; 3D; Transfer learning; Foundation models; Self-supervised learning; NEURAL-NETWORK;
D O I
10.1016/j.cag.2024.103993
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Foundation models, such as ULIP-2 (Xue et al., 2023) recently projected forward the field of 3D deep learning. These models are trained with significantly more data and show superior representation learning capacity in many downstream tasks like 3D shape classification and few-shot part segmentation. A particular characteristic of the recent 3D foundation models is that they are typically multi-modal, , and involve image (2D) as well as caption (text) branches. This leads to an intricate interplay that benefits all modalities. At the same time, the nature of the 3D encoders alone, involved in these foundation models is not well-understood. Specifically, there is little analysis on the utility of both pre-trained 3D features provided by these models, or their capacity to adapt to new downstream 3D data. Furthermore, existing studies typically focus on label-oriented downstream tasks, such as shape classification, and ignore other critical applications, such as 3D content-based object retrieval. In this paper, we fill this gap and show, for the first time, how 3D foundation models can be leveraged for strong 3D-to-3D retrieval performance on seven different datasets, on par with state-of-the-art view-based architectures. We evaluate both the pre-trained foundation models, as well as their fine-tuned versions using downstream data. We compare supervised fine-tuning using classification labels against two self-supervised label-free fine-tuning methods. Importantly, we introduce and describe a methodology for fine-tuning, as we found this to be crucial to make transfer learning from 3D foundation models work in a stable manner.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] FedPFT: Federated Proxy Fine-Tuning of Foundation Models
    Peng, Zhaopeng
    Fan, Xiaoliang
    Chen, Yufan
    Wang, Zheng
    Pan, Shirui
    Wen, Chenglu
    Zhang, Ruisheng
    Wang, Cheng
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4806 - 4814
  • [2] Foundation Models and Fine-Tuning: A Benchmark for Out of Distribution Detection
    Cappio Borlino, Francesco
    Lu, Lorenzo
    Tommasi, Tatiana
    IEEE ACCESS, 2024, 12 : 79401 - 79414
  • [3] A COMPARATIVE STUDY OF RECOGNITION MODELS BASED ON FINE-TUNING 3D CNNS FOR FARMING BEHAVIORS
    Su, Shibin
    Hu, Xiaonan
    Li, Xiang
    APPLIED ENGINEERING IN AGRICULTURE, 2023, 39 (01) : 121 - 132
  • [4] Fine-Tuning Foundation Models With Confidence Assessment for Enhanced Semantic Segmentation
    Dionelis, Nikolaos
    Longepe, Nicolas
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
  • [5] Stereoscopic video quality measurement with fine-tuning 3D ResNets
    Imani, Hassan
    Islam, Md Baharul
    Junayed, Masum Shah
    Aydin, Tarkan
    Arica, Nafiz
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (29) : 42849 - 42869
  • [6] Stereoscopic video quality measurement with fine-tuning 3D ResNets
    Hassan Imani
    Md Baharul Islam
    Masum Shah Junayed
    Tarkan Aydin
    Nafiz Arica
    Multimedia Tools and Applications, 2022, 81 : 42849 - 42869
  • [7] Fine-Tuning Large Language Models for Private Document Retrieval: A Tutorial
    Sommers, Frank
    Kongthon, Alisa
    Kongyoung, Sarawoot
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1319 - 1320
  • [8] Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
    Tang, Yiwen
    Zhang, Ray
    Guo, Zoey
    Ma, Xianzheng
    Zhao, Bin
    Wang, Zhigang
    Wang, Dong
    Li, Xuelong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5171 - 5179
  • [9] The Impact of Fine-Tuning in Embedding Models on Query-Dependent Retrieval Performance
    Alkan, Berkin
    Ozcan, Onur
    Karatas, Yahya Bahachr
    Unal, Muhammed Cihat
    Kolukisa, Oguz
    Karakaya, Ismail
    Karamanlioglu, Alper
    Demirel, Berkan
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [10] 3D sketching for 3D object retrieval
    Li, Bo
    Yuan, Juefei
    Ye, Yuxiang
    Lu, Yijuan
    Zhang, Chaoyang
    Tian, Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 9569 - 9595