iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-training for Visual Recognition

被引：5

作者：

Wei, Yixuan ^{[1
,2
]}

Cao, Yue ^{[2
]}

Zhang, Zheng ^{[2
]}

Peng, Houwen ^{[2
]}

Yao, Zhuliang ^{[1
,2
]}

Xie, Zhenda ^{[1
,2
]}

Hue, Han ^{[1
,2
]}

Guo, Baining ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00272

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a method that effectively combines two prevalent visual recognition methods, i.e., image classification and contrastive language-image pre-training, dubbed iCLIP. Instead of naive multi-task learning that use two separate heads for each task, we fuse the two tasks in a deep fashion that adapts the image classification to share the same formula and the same model weights with the language-image pre-training. To further bridge these two tasks, we propose to enhance the category names in image classification tasks using external knowledge, such as their descriptions in dictionaries. Extensive experiments show that the proposed method combines the advantages of two tasks well: the strong discrimination ability in image classification tasks due to the clean category labels, and the good zero-shot ability in CLIP tasks ascribed to the richer semantics in the text descriptions. In particular, it reaches 82.9% top-1 accuracy on IN-1K, and meanwhile surpasses CLIP by 1.8%, with similar model size, on zero-shot recognition of Kornblith 12-dataset benchmark. The code and models are publicly available at https: //github.com/weiyx16/iCLIP.

引用

页码：2776 / 2786

页数：11

共 50 条

[1] Contrastive Language-Image Pre-Training with Knowledge Graphs
Pan, Xuran
Ye, Tianzhu
Han, Dongchen
Song, Shiji
Huang, Gao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[2] A closer look at the explainability of Contrastive language-image pre-training
Li, Yi
Wang, Hualiang
Duan, Yiqun
Zhang, Jiheng
Li, Xiaomeng
PATTERN RECOGNITION, 2025, 162
[3] UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
Lee, Janghyeon
Kim, Jongsuk
Shon, Hyounguk
Kim, Bumsoo
Kim, Seung Hwan
Lee, Honglak
Kim, Junmo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[4] Grounded Language-Image Pre-training
Li, Liunian Harold
Zhang, Pengchuan
Zhang, Haotian
Yang, Jianwei
Li, Chunyuan
Zhong, Yiwu
Wang, Lijuan
Yuan, Lu
Zhang, Lei
Hwang, Jenq-Neng
Chang, Kai-Wei
Gao, Jianfeng
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10955 - 10965
[5] Using contrastive language-image pre-training for Thai recipe recommendation
Chuenbanluesuk, Thanatkorn
Plodprong, Voramate
Karoon, Weerasak
Rueangsri, Kotchakorn
Pojam, Suthasinee
Siriborvornratanakul, Thitirat
LANGUAGE RESOURCES AND EVALUATION, 2025,
[6] Non-Contrastive Learning Meets Language-Image Pre-Training
Zhou, Jinghao
Dong, Li
Gan, Zhe
Wang, Lijuan
Wei, Furu
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11028 - 11038
[7] Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
You, Haoxuan
Zhou, Luowei
Xiao, Bin
Codella, Noel
Cheng, Yu
Xu, Ruochen
Chang, Shih-Fu
Yuan, Lu
COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 69 - 87
[8] RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-training
Xie, Chen-Wei
Sun, Siyang
Xiong, Xiong
Zheng, Yun
Zhao, Deli
Zhou, Jingren
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19265 - 19274
[9] Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention
Tsai, Wei-Lun
Le, Phuong-Linh
Ho, Wang-Fat
Chi, Nai-Wen
Lin, Jacob J.
Tang, Shuai
Hsieh, Shang-Hsien
AUTOMATION IN CONSTRUCTION, 2025, 169
[10] Centered Masking for Language-Image Pre-training
Liang, Mingliang
Larson, Martha
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK AND DEMO TRACK, PT VIII, ECML PKDD 2024, 2024, 14948 : 90 - 106

← 1 2 3 4 5 →