Exploring Effective Factors for Improving Visual In-Context Learning

被引：0

作者：

Sun, Yanpeng ^{[1
]}

Chen, Qiang ^{[2
]}

Wang, Jian ^{[2
]}

Wang, Jingdong ^{[2
]}

Li, Zechao ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

[2] Baidu, Visual Technol Dept, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2025年 / 34卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Adaptation models; Computational modeling; Predictive models; Computer vision; Cognition; Semantics; Prompt engineering; Context modeling; Training; Visual in-context learning; large-scale vision model; in-context learning; prompt selection; prompt fusion; SHOT; NETWORK;

D O I：

10.1109/TIP.2025.3554410

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.

引用

页码：2147 / 2160

页数：14

共 50 条

[21] Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
Chang, Kai-Wei
Hsu, Ming-Hao
Li, Shan-Wen
Lee, Hung-yi
INTERSPEECH 2024, 2024, : 4139 - 4143
[22] Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model
Gu, Zheng
Yang, Shiyuan
Liao, Jing
Huo, Jing
Gao, Yang
ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
[23] ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
Sun, Yasheng
Yang, Yifan
Peng, Houwen
Shen, Yifei
Yang, Yuqing
Hu, Han
Qiu, Lili
Koike, Hideki
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[24] Exploring Diverse In-Context Configurations for Image Captioning
Yang, Xu
Wu, Yongliang
Yang, Mingzhuo
Chen, Haokun
Geng, Xin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[25] Improving isolated and in-context classification of handwritten characters
Mazalov, Vadim
Watt, Stephen M.
DOCUMENT RECOGNITION AND RETRIEVAL XIX, 2012, 8297
[26] Learning In-context Learning for Named Entity Recognition
Chen, Jiawei
Lu, Yaojie
Lin, Hongyu
Lou, Jie
Jia, Wei
Dai, Dai
Wu, Hua
Cao, Boxi
Han, Xianpei
Sun, Le
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13661 - 13675
[27] OLIVE: Object Level In-Context Visual Embeddings
Ossowski, Timothy
Hu, Junjie
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5170 - 5185
[28] Towards More Unified In-context Visual Understanding
Sheng, Dianmo
Chen, Dongdong
Tan, Zhentao
Liu, Qiankun
Chu, Qi
Bao, Jianmin
Gong, Tao
Liu, Bin
Xu, Shengwei
Yu, Nenghai
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13362 - 13372
[29] Dissecting In-Context Learning of Translations in GPTs
Raunak, Vikas
Awadalla, Hany Hassan
Menezes, Arul
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 866 - 872
[30] Unified Demonstration Retriever for In-Context Learning
Li, Xiaonan
Lv, Kai
Yan, Hang
Lin, Tianyang
Wei, Zhu
Ni, Yuan
Xie, Guotong
Wang, Xiaoling
Qiu, Xipeng
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4644 - 4668

← 1 2 3 4 5 →