Learning Customized Visual Models with Retrieval-Augmented Knowledge

被引:5
|
作者
Liu, Haotian [1 ]
Son, Kilho [2 ]
Yang, Jianwei [2 ]
Liu, Ce [2 ]
Gao, Jianfeng [2 ]
Lee, Yong Jae [1 ]
Li, Chunyuan [2 ]
机构
[1] Univ Wisconsin Madison, Madison, WI 53706 USA
[2] Microsoft, Redmond, WA USA
关键词
D O I
10.1109/CVPR52729.2023.01454
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability. The high generality and usability of these visual models is achieved via a web-scale data collection process to ensure broad concept coverage, followed by expensive pre-training to feed all the knowledge into model weights. Alternatively, we propose REACT, REtrieval-Augmented CusTomization, a framework to acquire the relevant web knowledge to build customized visual models for target domains. We retrieve the most relevant image-text pairs (similar to 3% of CLIP pre-training data) from the web-scale database as external knowledge and propose to customize the model by only training new modularized blocks while freezing all the original weights. The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings. Particularly, on the zero-shot classification task, compared with CLIP, it achieves up to 5.4% improvement on ImageNet and 3.7% on the ELEVATER benchmark (20 datasets).
引用
收藏
页码:15148 / 15158
页数:11
相关论文
共 50 条
  • [41] Improving knowledge management in building engineering with hybrid retrieval-augmented generation framework
    Wang, Zhiqi
    Liu, Zhongcun
    Lu, Weizhen
    Jia, Lu
    JOURNAL OF BUILDING ENGINEERING, 2025, 103
  • [42] Diverse Retrieval-Augmented In-Context Learning for Dialogue State Tracking
    King, Brendan
    Flanigan, Jeffrey
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5570 - 5583
  • [43] REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
    Hu, Ziniu
    Iscen, Ahmet
    Sun, Chen
    Wang, Zirui
    Chang, Kai-Wei
    Sun, Yizhou
    Schmid, Cordelia
    Ross, David A.
    Fathi, Alireza
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23369 - 23379
  • [44] Integrating Graph Retrieval-Augmented Generation With Large Language Models for Supplier Discovery
    Li, Yunqing
    Ko, Hyunwoong
    Ameri, Farhad
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2025, 25 (02)
  • [45] Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training
    Fang, Feiteng
    Bai, Yuelin
    Ni, Shiwen
    Yang, Min
    Chen, Xiaojun
    Xu, Ruifeng
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 10028 - 10039
  • [46] Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
    Kim, Gangwoo
    Kim, Sungdong
    Jeon, Byeongguk
    Park, Joonsuk
    Kang, Jaewoo
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 996 - 1009
  • [47] ReACC: A Retrieval-Augmented Code Completion Framework
    Lu, Shuai
    Duan, Nan
    Han, Hojae
    Guo, Daya
    Hwang, Seung-won
    Svyatkovskiy, Alexey
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6227 - 6240
  • [48] RETRIEVAL-AUGMENTED TEXT-TO-AUDIO GENERATION
    Yuan, Yi
    Liu, Haohe
    Liu, Xubo
    Huang, Qiushi
    Plumbley, Mark D.
    Wang, Wenwu
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 581 - 585
  • [49] Recent Advances in Retrieval-Augmented Text Generation
    Cai, Deng
    Wang, Yan
    Liu, Lemao
    Shi, Shuming
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3417 - 3419
  • [50] Retrieval-augmented Recommender System: Enhancing Recommender Systems with Large Language Models
    Di Palma, Dario
    PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 1369 - 1373