How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

被引:8
|
作者
Ming, Yifei [1 ]
Li, Yixuan [1 ]
机构
[1] Univ Wisconsin Madison, Dept Comp Sci, Madison, WI 53715 USA
基金
美国国家科学基金会;
关键词
CLIP; OOD detection; Fine-tuning; Multi-modality; Vision-language models; Prompt learning; Few-shot learning; Adaptor; BLIND DECONVOLUTION; INVERSE PROBLEMS; IDENTIFIABILITY; KERNEL; NOISE;
D O I
10.1007/s11263-023-01895-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent large vision-language models such as CLIP have shown remarkable out-of-distribution (OOD) detection and generalization performance. However, their zero-shot in-distribution (ID) accuracy is often limited for downstream datasets. Recent CLIP-based fine-tuning methods such as prompt learning have demonstrated significant improvements in ID classification and OOD generalization where OOD labels are available. Nonetheless, it remains unclear whether the model is reliable to semantic shifts without OOD labels. In this paper, we aim to bridge the gap and present a comprehensive study to understand how fine-tuning impact OOD detection for few-shot downstream tasks. By framing OOD detection as multi-modal concept matching, we establish a connection between fine-tuning methods and various OOD scores. Our results suggest that a proper choice of OOD scores is essential for CLIP-based fine-tuning. In particular, the maximum concept matching (MCM) score provides a promising solution consistently. We also show that prompt learning demonstrates the state-of-the-art OOD detection performance over the zero-shot counterpart.
引用
收藏
页码:596 / 609
页数:14
相关论文
共 50 条
  • [1] How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?
    Yifei Ming
    Yixuan Li
    International Journal of Computer Vision, 2024, 132 : 596 - 609
  • [2] Delving into Out-of-Distribution Detection with Vision-Language Representations
    Ming, Yifei
    Cai, Ziyang
    Gu, Jiuxiang
    Sun, Yiyou
    Li, Wei
    Li, Yixuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
    Zhu, Beier
    Niu, Yulei
    Lee, Saeil
    Hur, Minhoe
    Zhang, Hanwang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3834 - 3842
  • [4] Robust Fine-Tuning of Vision-Language Models for Domain Generalization
    Vogt-Lowell, Kevin
    Lee, Noah
    Tsiligkaridis, Theodoros
    Vaillant, Marc
    2023 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE, HPEC, 2023,
  • [5] Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
    Zhou, Andy
    Wang, Jindong
    Wang, Yu-Xiong
    Wang, Haohan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data
    Kong, Lingkai
    Jiang, Haoming
    Zhuang, Yuchen
    Lyu, Jie
    Zhao, Tuo
    Zhang, Chao
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1326 - 1340
  • [7] Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection
    Zhang, Zihan
    Xu, Zhuo
    Xiang, Xiang
    COMPUTER VISION - ECCV 2024, PT LXXXV, 2025, 15143 : 273 - 291
  • [8] A survey of efficient fine-tuning methods for Vision-Language Models - Prompt and Adapter
    Xing, Jialu
    Liu, Jianping
    Wang, Jian
    Sun, Lulu
    Chen, Xi
    Gu, Xunxun
    Wang, Yingfei
    COMPUTERS & GRAPHICS-UK, 2024, 119
  • [9] SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
    Nguyen, Bac
    Uhlich, Stefan
    Cardinaux, Fabien
    Mauch, Lukas
    Edraki, Marzieh
    Courville, Aaron
    COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 138 - 154
  • [10] Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering?
    Jensen, Kristian Norgaard
    Plank, Barbara
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1496 - 1508