Exploring Effective Factors for Improving Visual In-Context Learning

被引:0
|
作者
Sun, Yanpeng [1 ]
Chen, Qiang [2 ]
Wang, Jian [2 ]
Wang, Jingdong [2 ]
Li, Zechao [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Baidu, Visual Technol Dept, Beijing 100085, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Adaptation models; Computational modeling; Predictive models; Computer vision; Cognition; Semantics; Prompt engineering; Context modeling; Training; Visual in-context learning; large-scale vision model; in-context learning; prompt selection; prompt fusion; SHOT; NETWORK;
D O I
10.1109/TIP.2025.3554410
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.
引用
收藏
页码:2147 / 2160
页数:14
相关论文
共 50 条
  • [21] Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
    Chang, Kai-Wei
    Hsu, Ming-Hao
    Li, Shan-Wen
    Lee, Hung-yi
    INTERSPEECH 2024, 2024, : 4139 - 4143
  • [22] Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model
    Gu, Zheng
    Yang, Shiyuan
    Liao, Jing
    Huo, Jing
    Gao, Yang
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
  • [23] ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
    Sun, Yasheng
    Yang, Yifan
    Peng, Houwen
    Shen, Yifei
    Yang, Yuqing
    Hu, Han
    Qiu, Lili
    Koike, Hideki
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [24] Exploring Diverse In-Context Configurations for Image Captioning
    Yang, Xu
    Wu, Yongliang
    Yang, Mingzhuo
    Chen, Haokun
    Geng, Xin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [25] Improving isolated and in-context classification of handwritten characters
    Mazalov, Vadim
    Watt, Stephen M.
    DOCUMENT RECOGNITION AND RETRIEVAL XIX, 2012, 8297
  • [26] Learning In-context Learning for Named Entity Recognition
    Chen, Jiawei
    Lu, Yaojie
    Lin, Hongyu
    Lou, Jie
    Jia, Wei
    Dai, Dai
    Wu, Hua
    Cao, Boxi
    Han, Xianpei
    Sun, Le
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13661 - 13675
  • [27] OLIVE: Object Level In-Context Visual Embeddings
    Ossowski, Timothy
    Hu, Junjie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5170 - 5185
  • [28] Towards More Unified In-context Visual Understanding
    Sheng, Dianmo
    Chen, Dongdong
    Tan, Zhentao
    Liu, Qiankun
    Chu, Qi
    Bao, Jianmin
    Gong, Tao
    Liu, Bin
    Xu, Shengwei
    Yu, Nenghai
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13362 - 13372
  • [29] Dissecting In-Context Learning of Translations in GPTs
    Raunak, Vikas
    Awadalla, Hany Hassan
    Menezes, Arul
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 866 - 872
  • [30] Unified Demonstration Retriever for In-Context Learning
    Li, Xiaonan
    Lv, Kai
    Yan, Hang
    Lin, Tianyang
    Wei, Zhu
    Ni, Yuan
    Xie, Guotong
    Wang, Xiaoling
    Qiu, Xipeng
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4644 - 4668