Learning Customized Visual Models with Retrieval-Augmented Knowledge

被引:5
|
作者
Liu, Haotian [1 ]
Son, Kilho [2 ]
Yang, Jianwei [2 ]
Liu, Ce [2 ]
Gao, Jianfeng [2 ]
Lee, Yong Jae [1 ]
Li, Chunyuan [2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Microsoft, Redmond, WA USA
关键词
D O I
10.1109/CVPR52729.2023.01454
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability. The high generality and usability of these visual models is achieved via a web-scale data collection process to ensure broad concept coverage, followed by expensive pre-training to feed all the knowledge into model weights. Alternatively, we propose REACT, REtrieval-Augmented CusTomization, a framework to acquire the relevant web knowledge to build customized visual models for target domains. We retrieve the most relevant image-text pairs (similar to 3% of CLIP pre-training data) from the web-scale database as external knowledge and propose to customize the model by only training new modularized blocks while freezing all the original weights. The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings. Particularly, on the zero-shot classification task, compared with CLIP, it achieves up to 5.4% improvement on ImageNet and 3.7% on the ELEVATER benchmark (20 datasets).
引用
收藏
页码:15148 / 15158
页数:11
相关论文
共 50 条
  • [31] Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models
    Li, Mingda
    Li, Xinyu
    Chen, Yifan
    Xuan, Wenfeng
    Zhang, Weinan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4833 - 4850
  • [32] Retrieval-Augmented Transformer for Image Captioning
    Sarto, Sara
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 1 - 7
  • [33] RECAP: RETRIEVAL-AUGMENTED AUDIO CAPTIONING
    Ghosh, Sreyan
    Kumar, Sonal
    Evuru, Chandra Kiran Reddy
    Duraiswami, Ramani
    Manocha, Dinesh
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1161 - 1165
  • [34] Retrieval-Augmented Audio Deepfake Detection
    Kang, Zuheng
    He, Yayun
    Zhao, Botao
    Qu, Xiaoyang
    Peng, Junqing
    Xiao, Jing
    Wang, Jianzong
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 376 - 384
  • [35] Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
    Li, Wenyan
    Li, Jiaang
    Ramose, Rita
    Tang, Raphael
    Elliott, Desmond
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9285 - 9299
  • [36] Evaluating Retrieval-Augmented Generation Models for Financial Report Question and Answering
    Iaroshev, Ivan
    Pillai, Ramalingam
    Vaglietti, Leandro
    Hanne, Thomas
    APPLIED SCIENCES-BASEL, 2024, 14 (20):
  • [37] Enhancing Retrieval-Augmented Generation Models with Knowledge Graphs: Innovative Practices Through a Dual-Pathway Approach
    Xu, Sheng
    Chen, Mike
    Chen, Shuwen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14880 : 398 - 409
  • [38] Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
    Jiang, Wenqi
    Zeller, Marco
    Waleffe, Roger
    Hoefler, Torsten
    Alonso, Gustavo
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 18 (01): : 42 - 52
  • [39] Can Small Language Models With Retrieval-Augmented Generation Replace Large Language Models When Learning Computer Science?
    Liu, Suqing
    Yu, Zezhu
    Huang, Feiran
    Bulbulia, Yousef
    Bergen, Andreas
    Liut, Michael
    PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024, 2024, : 388 - 393
  • [40] Knowledge graph enhanced retrieval-augmented generation for failure mode and effects analysis
    Bahr, Lukas
    Wehner, Christoph
    Wewerka, Judith
    Bittencourt, Jose
    Schmid, Ute
    Daub, Ruediger
    JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION, 2025, 45