Exploring Effective Factors for Improving Visual In-Context Learning

被引：0

作者：

Sun, Yanpeng ^{[1
]}

Chen, Qiang ^{[2
]}

Wang, Jian ^{[2
]}

Wang, Jingdong ^{[2
]}

Li, Zechao ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

[2] Baidu, Visual Technol Dept, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2025年 / 34卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Adaptation models; Computational modeling; Predictive models; Computer vision; Cognition; Semantics; Prompt engineering; Context modeling; Training; Visual in-context learning; large-scale vision model; in-context learning; prompt selection; prompt fusion; SHOT; NETWORK;

D O I：

10.1109/TIP.2025.3554410

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.

引用

页码：2147 / 2160

页数：14

共 50 条

[31] In-Context Learning Unlocked for Diffusion Models
Wang, Zhendong
Jiang, Yifan
Lu, Yadong
Shen, Yelong
He, Pengcheng
Chen, Weizhu
Wang, Zhangyang
Zhou, Mingyuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[32] Finding Support Examples for In-Context Learning
Li, Xiaonan
Qiu, Xipeng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6219 - 6235
[33] Mitigating Label Biases for In-context Learning
Fei, Yu
Hou, Yifan
Chen, Zeming
Bosselut, Antoine
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14014 - 14031
[34] IN-CONTEXT LANGUAGE LEARNING: ARCHITECTURES AND ALGORITHMS
Akyürek, Ekin
Wang, Bailin
Kim, Yoon
Andreas, Jacob
arXiv,
[35] Guideline Learning for In-Context Information Extraction
Pang, Chaoxu
Cao, Yixuan
Ding, Qiang
Luo, Ping
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 15372 - 15389
[36] In-Context Learning Creates Task Vectors
Hendel, Roee
Geva, Mor
Globerson, Amir
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9318 - 9333
[37] Improving In-Context Few-Shot Learning via Self-Supervised Training
Chen, Mingda
Du, Jingfei
Pasunuru, Ramakanth
Mihaylov, Todor
Iyer, Srini
Stoyanov, Veselin
Kozareva, Zornitsa
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3558 - 3573
[38] Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering
Wu, Zhiyong
Wang, Yaoxiang
Ye, Jiacheng
Kong, Lingpeng
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1423 - 1436
[39] Prompt Optimization via Adversarial In-Context Learning
Do, Xuan Long
Zhao, Yiran
Brown, Hannah
Xie, Yuxi
Zhao, James Xu
Chen, Nancy F.
Kawaguchi, Kenji
Shieh, Michael
He, Junxian
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7308 - 7327
[40] Robustness of Named Entity Replacements for In-Context Learning
Goodarzi, Saeed
Kagita, Nikhil
Minn, Dennis
Wang, Shufan
Dessi, Roberto
Toshniwal, Shubham
Williams, Adina
Lanchantin, Jack
Sinha, Koustuv
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10914 - 10931

← 1 2 3 4 5 →