How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

被引：8

作者：

Ming, Yifei ^{[1
]}

Li, Yixuan ^{[1
]}

机构：

[1] Univ Wisconsin Madison, Dept Comp Sci, Madison, WI 53715 USA

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2024年 / 132卷 / 02期

基金：

美国国家科学基金会;

关键词：

CLIP; OOD detection; Fine-tuning; Multi-modality; Vision-language models; Prompt learning; Few-shot learning; Adaptor; BLIND DECONVOLUTION; INVERSE PROBLEMS; IDENTIFIABILITY; KERNEL; NOISE;

D O I：

10.1007/s11263-023-01895-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent large vision-language models such as CLIP have shown remarkable out-of-distribution (OOD) detection and generalization performance. However, their zero-shot in-distribution (ID) accuracy is often limited for downstream datasets. Recent CLIP-based fine-tuning methods such as prompt learning have demonstrated significant improvements in ID classification and OOD generalization where OOD labels are available. Nonetheless, it remains unclear whether the model is reliable to semantic shifts without OOD labels. In this paper, we aim to bridge the gap and present a comprehensive study to understand how fine-tuning impact OOD detection for few-shot downstream tasks. By framing OOD detection as multi-modal concept matching, we establish a connection between fine-tuning methods and various OOD scores. Our results suggest that a proper choice of OOD scores is essential for CLIP-based fine-tuning. In particular, the maximum concept matching (MCM) score provides a promising solution consistently. We also show that prompt learning demonstrates the state-of-the-art OOD detection performance over the zero-shot counterpart.

引用

页码：596 / 609

页数：14

共 50 条

[1] How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?
Yifei Ming
Yixuan Li
International Journal of Computer Vision, 2024, 132 : 596 - 609
[2] Delving into Out-of-Distribution Detection with Vision-Language Representations
Ming, Yifei
Cai, Ziyang
Gu, Jiuxiang
Sun, Yiyou
Li, Wei
Li, Yixuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
Zhu, Beier
Niu, Yulei
Lee, Saeil
Hur, Minhoe
Zhang, Hanwang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3834 - 3842
[4] Robust Fine-Tuning of Vision-Language Models for Domain Generalization
Vogt-Lowell, Kevin
Lee, Noah
Tsiligkaridis, Theodoros
Vaillant, Marc
2023 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE, HPEC, 2023,
[5] Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
Zhou, Andy
Wang, Jindong
Wang, Yu-Xiong
Wang, Haohan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data
Kong, Lingkai
Jiang, Haoming
Zhuang, Yuchen
Lyu, Jie
Zhao, Tuo
Zhang, Chao
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1326 - 1340
[7] Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection
Zhang, Zihan
Xu, Zhuo
Xiang, Xiang
COMPUTER VISION - ECCV 2024, PT LXXXV, 2025, 15143 : 273 - 291
[8] A survey of efficient fine-tuning methods for Vision-Language Models - Prompt and Adapter
Xing, Jialu
Liu, Jianping
Wang, Jian
Sun, Lulu
Chen, Xi
Gu, Xunxun
Wang, Yingfei
COMPUTERS & GRAPHICS-UK, 2024, 119
[9] SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Nguyen, Bac
Uhlich, Stefan
Cardinaux, Fabien
Mauch, Lukas
Edraki, Marzieh
Courville, Aaron
COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 138 - 154
[10] Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering?
Jensen, Kristian Norgaard
Plank, Barbara
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1496 - 1508

← 1 2 3 4 5 →