Exploring Effective Factors for Improving Visual In-Context Learning

被引:0
|
作者
Sun, Yanpeng [1 ]
Chen, Qiang [2 ]
Wang, Jian [2 ]
Wang, Jingdong [2 ]
Li, Zechao [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Baidu, Visual Technol Dept, Beijing 100085, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Adaptation models; Computational modeling; Predictive models; Computer vision; Cognition; Semantics; Prompt engineering; Context modeling; Training; Visual in-context learning; large-scale vision model; in-context learning; prompt selection; prompt fusion; SHOT; NETWORK;
D O I
10.1109/TIP.2025.3554410
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.
引用
收藏
页码:2147 / 2160
页数:14
相关论文
共 50 条
  • [1] Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation
    Suo, Wei
    Lai, Lanqing
    Sun, Mengyang
    Zhang, Hanwang
    Wang, Peng
    Zhang, Yanning
    COMPUTER VISION-ECCV 2024, PT XLVI, 2025, 15104 : 18 - 35
  • [2] Complementary Explanations for Effective In-Context Learning
    Ye, Xi
    Iyer, Srinivasan
    Celikyilmaz, Asli
    Stoyanov, Ves
    Durrett, Greg
    Pasunuru, Ramakanth
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4469 - 4484
  • [3] Exploring In-Context Learning for Knowledge Grounded Dialog Generation
    Chen, Qinyu
    Wu, Wenhao
    Li, Sujian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10071 - 10081
  • [4] What Makes Good Examples for Visual In-Context Learning?
    Zhang, Yuanhan
    Zhou, Kaiyang
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] In-Context In-Context Learning with Transformer Neural Processes
    Ashman, Matthew
    Diaconu, Cristiana
    Weller, Adrian
    Turner, Richard E.
    SYMPOSIUM ON ADVANCES IN APPROXIMATE BAYESIAN INFERENCE, 2024, 253 : 1 - 29
  • [6] Visual In-Context Learning for Large Vision-Language Models
    Zhou, Yucheng
    Le, Xiang
    Wang, Qianning
    Shen, Jianbing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902
  • [7] Images Speak in Images: A Generalist Painter for In-Context Visual Learning
    Wang, Xinlong
    Wang, Wen
    Cao, Yue
    Shen, Chunhua
    Huang, Tiejun
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6830 - 6839
  • [8] Instruct Me More! Random Prompting for Visual In-Context Learning
    Zhang, Jiahao
    Wang, Bowen
    Li, Liangzhi
    Nakashima, Yuta
    Nagahara, Hajime
    arXiv, 2023,
  • [9] The Learnability of In-Context Learning
    Wies, Noam
    Levine, Yoav
    Shashua, Amnon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] A glance at in-context learning
    Wu, Yongliang
    Yang, Xu
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (05)