Context-Aware Robust Fine-Tuning

被引：0

作者：

Xiaofeng Mao

Yufeng Chen

Xiaojun Jia

Rong Zhang

Hui Xue

Zhao Li

机构：

[1] Alibaba Group,Institute of Information Engineering

[2] Chinese Academy of Sciences,undefined

[3] Zhejiang University,undefined

来源：

International Journal of Computer Vision | 2024年 / 132卷

关键词：

Pre-trained models; CLIP; Fine-tuning; Robustness;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Contrastive language-image pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to “[CLASS]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathtt {[CLASS]}$$\end{document}” by using similarity between the image and the prompt sentence “a [CONTEXT]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathtt {[CONTEXT]}$$\end{document} of [CLASS]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathtt {[CLASS]}$$\end{document}”. Based on exhaustive text cues in “[CONTEXT]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathtt {[CONTEXT]}$$\end{document}”, CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the context distribution contained in the image. By minimizing the Kullback–Leibler divergence (KLD) between context distributions induced by original/fine-tuned CLIP models, CAR-FT makes the context-aware ability of CLIP inherited into downstream tasks, and achieves both higher in-distribution (ID) and out-of-distribution (OOD) accuracy. The experimental results show CAR-FT achieves superior robustness on five OOD test datasets of ImageNet, and meanwhile brings accuracy gains on nine downstream tasks. Additionally, CAR-FT surpasses previous domain generalization (DG) methods and gets 78.5% averaged accuracy on DomainBed benchmark, building the new state-of-the-art.

引用

页码：1685 / 1700

页数：15

共 50 条

[41] FINE-TUNING A NEWSLETTER
BURKE, EJ
ABA JOURNAL, 1988, 74 : 100 - &
[42] FINE-TUNING FOR PROFITS
SCALES, M
CANADIAN MINING JOURNAL, 1995, 116 (02) : 15 - &
[43] FINE-TUNING POLYPROPYLENE
ARZOUMANIDIS, GG
KARAYANNIS, NM
CHEMTECH, 1993, 23 (07) : 43 - 48
[44] Fine-tuning inspections
Anon
Compressed Air, 1994, 99 (05): : 4 - 5
[45] Fine-tuning adhesives
Grant Miura
Nature Chemical Biology, 2020, 16 : 1153 - 1153
[46] Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability
Ha, Seokhyeon
Jeong, Sunbeom
Lee, Jungwoo
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, : 12261 - 12269
[47] Fine-Tuning the Future
Youngner, Stuart J.
HASTINGS CENTER REPORT, 2010, 40 (03) : 7 - 8
[48] Monitoring is Fine-tuning
不详
FLEISCHWIRTSCHAFT, 2020, 100 (11):
[49] Fine-tuning polylactides
O'Driscoll, C
CHEMISTRY IN BRITAIN, 2001, 37 (06) : 25 - 26
[50] Gauging fine-tuning
Azhar, Feraz
Loeb, Abraham
PHYSICAL REVIEW D, 2018, 98 (10)

← 1 2 3 4 5 →