Learning Customized Visual Models with Retrieval-Augmented Knowledge

被引:5
|
作者
Liu, Haotian [1 ]
Son, Kilho [2 ]
Yang, Jianwei [2 ]
Liu, Ce [2 ]
Gao, Jianfeng [2 ]
Lee, Yong Jae [1 ]
Li, Chunyuan [2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Microsoft, Redmond, WA USA
关键词
D O I
10.1109/CVPR52729.2023.01454
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability. The high generality and usability of these visual models is achieved via a web-scale data collection process to ensure broad concept coverage, followed by expensive pre-training to feed all the knowledge into model weights. Alternatively, we propose REACT, REtrieval-Augmented CusTomization, a framework to acquire the relevant web knowledge to build customized visual models for target domains. We retrieve the most relevant image-text pairs (similar to 3% of CLIP pre-training data) from the web-scale database as external knowledge and propose to customize the model by only training new modularized blocks while freezing all the original weights. The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings. Particularly, on the zero-shot classification task, compared with CLIP, it achieves up to 5.4% improvement on ImageNet and 3.7% on the ELEVATER benchmark (20 datasets).
引用
收藏
页码:15148 / 15158
页数:11
相关论文
共 50 条
  • [1] Retrieval-Augmented Diffusion Models
    Blattmann, Andreas
    Rombach, Robin
    Oktay, Kaan
    Mueller, Jonas
    Ommer, Bjoern
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning
    Chen, Xiang
    Li, Lei
    Zhang, Ningyu
    Liang, Xiaozhuan
    Deng, Shumin
    Tan, Chuanqi
    Huang, Fei
    Si, Luo
    Chen, Huajun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Retrieval-Augmented Multiple Instance Learning
    Cui, Yufei
    Liu, Ziquan
    Chen, Yixin
    Lu, Yuchen
    Yu, Xinyue
    Liu, Xue
    Kuo, Tei-Wei
    Rodrigues, Miguel R. D.
    Xue, Chun Jason
    Chan, Antoni B.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Retrieval-augmented Generation across Heterogeneous Knowledge
    Yu, Wenhao
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 52 - 58
  • [5] In-Context Retrieval-Augmented Language Models
    Ram, Ori
    Levine, Yoav
    Dalmedigos, Itay
    Muhlgay, Dor
    Shashua, Amnon
    Leyton-Brown, Kevin
    Shoham, Yoav
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1316 - 1331
  • [6] ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models
    Zhang, Jianyi
    Muhamed, Aashiq
    Anantharaman, Aditya
    Wang, Guoyin
    Chen, Changyou
    Zhong, Kai
    Cui, Qingjun
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    Chen, Yiran
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1128 - 1136
  • [7] Fine-grained knowledge fusion for retrieval-augmented medical visual question answering
    Liang, Xiao
    Wang, Di
    Jing, Bin
    Jiao, Zhicheng
    Li, Ronghan
    Liu, Ruyi
    Miao, Qiguang
    Wang, Quan
    INFORMATION FUSION, 2025, 120
  • [8] RAVL: A Retrieval-Augmented Visual Language Model Framework for Knowledge-Based Visual Question Answering
    Chai, Naiquan
    Zou, Dongsheng
    Liu, Jiyuan
    Wang, Hao
    Yang, Yuming
    Song, Xinyi
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 394 - 406
  • [9] Query Rewriting for Retrieval-Augmented Large Language Models
    Ma, Xinbei
    Gong, Yeyun
    He, Pengcheng
    Zhao, Hai
    Duan, Nan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
  • [10] GOODTRIEVER: Adaptive Toxicity Mitigation with Retrieval-augmented Models
    Pozzobon, Luiza
    Ermis, Beyza
    Lewis, Patrick
    Hooker, Sara
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5108 - 5125