Exploring Effective Factors for Improving Visual In-Context Learning

被引：0

作者：

Sun, Yanpeng ^{[1
]}

Chen, Qiang ^{[2
]}

Wang, Jian ^{[2
]}

Wang, Jingdong ^{[2
]}

Li, Zechao ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

[2] Baidu, Visual Technol Dept, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2025年 / 34卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Adaptation models; Computational modeling; Predictive models; Computer vision; Cognition; Semantics; Prompt engineering; Context modeling; Training; Visual in-context learning; large-scale vision model; in-context learning; prompt selection; prompt fusion; SHOT; NETWORK;

D O I：

10.1109/TIP.2025.3554410

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.

引用

页码：2147 / 2160

页数：14

共 50 条

[1] Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation
Suo, Wei
Lai, Lanqing
Sun, Mengyang
Zhang, Hanwang
Wang, Peng
Zhang, Yanning
COMPUTER VISION-ECCV 2024, PT XLVI, 2025, 15104 : 18 - 35
[2] Complementary Explanations for Effective In-Context Learning
Ye, Xi
Iyer, Srinivasan
Celikyilmaz, Asli
Stoyanov, Ves
Durrett, Greg
Pasunuru, Ramakanth
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4469 - 4484
[3] Exploring In-Context Learning for Knowledge Grounded Dialog Generation
Chen, Qinyu
Wu, Wenhao
Li, Sujian
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10071 - 10081
[4] What Makes Good Examples for Visual In-Context Learning?
Zhang, Yuanhan
Zhou, Kaiyang
Liu, Ziwei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] In-Context In-Context Learning with Transformer Neural Processes
Ashman, Matthew
Diaconu, Cristiana
Weller, Adrian
Turner, Richard E.
SYMPOSIUM ON ADVANCES IN APPROXIMATE BAYESIAN INFERENCE, 2024, 253 : 1 - 29
[6] Visual In-Context Learning for Large Vision-Language Models
Zhou, Yucheng
Le, Xiang
Wang, Qianning
Shen, Jianbing
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902
[7] Images Speak in Images: A Generalist Painter for In-Context Visual Learning
Wang, Xinlong
Wang, Wen
Cao, Yue
Shen, Chunhua
Huang, Tiejun
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6830 - 6839
[8] Instruct Me More! Random Prompting for Visual In-Context Learning
Zhang, Jiahao
Wang, Bowen
Li, Liangzhi
Nakashima, Yuta
Nagahara, Hajime
arXiv, 2023,
[9] The Learnability of In-Context Learning
Wies, Noam
Levine, Yoav
Shashua, Amnon
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] A glance at in-context learning
Wu, Yongliang
Yang, Xu
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (05)

← 1 2 3 4 5 →