Context-Aware Robust Fine-Tuning

被引：0

作者：

Xiaofeng Mao

Yufeng Chen

Xiaojun Jia

Rong Zhang

Hui Xue

Zhao Li

机构：

[1] Alibaba Group,Institute of Information Engineering

[2] Chinese Academy of Sciences,undefined

[3] Zhejiang University,undefined

来源：

International Journal of Computer Vision | 2024年 / 132卷

关键词：

Pre-trained models; CLIP; Fine-tuning; Robustness;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Contrastive language-image pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to “[CLASS]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathtt {[CLASS]}$$\end{document}” by using similarity between the image and the prompt sentence “a [CONTEXT]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathtt {[CONTEXT]}$$\end{document} of [CLASS]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathtt {[CLASS]}$$\end{document}”. Based on exhaustive text cues in “[CONTEXT]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathtt {[CONTEXT]}$$\end{document}”, CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the context distribution contained in the image. By minimizing the Kullback–Leibler divergence (KLD) between context distributions induced by original/fine-tuned CLIP models, CAR-FT makes the context-aware ability of CLIP inherited into downstream tasks, and achieves both higher in-distribution (ID) and out-of-distribution (OOD) accuracy. The experimental results show CAR-FT achieves superior robustness on five OOD test datasets of ImageNet, and meanwhile brings accuracy gains on nine downstream tasks. Additionally, CAR-FT surpasses previous domain generalization (DG) methods and gets 78.5% averaged accuracy on DomainBed benchmark, building the new state-of-the-art.

引用

页码：1685 / 1700

页数：15

共 50 条

[1] Context-Aware Robust Fine-Tuning
Mao, Xiaofeng
Chen, Yufeng
Jia, Xiaojun
Zhang, Rong
Xue, Hui
Li, Zhao
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (05) : 1685 - 1700
[2] Context-Aware and Semantic-Consistent Spatial Interactions for One-Shot Object Detection Without Fine-Tuning
Yang, Hanqing
Cai, Sijia
Deng, Bing
Ye, Jieping
Lin, Guosheng
Zhang, Yu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5424 - 5439
[3] Fine-tuning in the context of Bayesian theory testing
Luke A. Barnes
European Journal for Philosophy of Science, 2018, 8 : 253 - 269
[4] Fast Trainable Projection for Robust Fine-Tuning
Tian, Junjiao
Liu, Yen-Cheng
Smith, James Seale
Kira, Zsolt
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Fine-tuning in the context of Bayesian theory testing
Barnes, Luke A.
EUROPEAN JOURNAL FOR PHILOSOPHY OF SCIENCE, 2018, 8 (02) : 253 - 269
[6] Context-aware features and robust image representations
Martins, P.
Carvalho, P.
Gatta, C.
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (02) : 339 - 348
[7] A Framework for Programming Robust Context-Aware Applications
Kulkarni, Devdatta
Tripathi, Anand
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2010, 36 (02) : 184 - 197
[8] Robust Estimation Using Context-Aware Filtering
Ivanov, Radoslav
Atanasov, Nikolay
Pajic, Miroslav
Pappas, George
Lee, Insup
2015 53RD ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2015, : 590 - 597
[9] Context-aware generative prompt tuning for relation extraction
Liu, Xiaoyong
Wen, Handong
Xu, Chunlin
Du, Zhiguo
Li, Huihui
Hu, Miao
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (12) : 5495 - 5508
[10] Fine-tuning
不详
AVIATION WEEK & SPACE TECHNOLOGY, 2001, 155 (02): : 21 - 21

← 1 2 3 4 5 →