Exploring Effective Factors for Improving Visual In-Context Learning

被引：0

作者：

Sun, Yanpeng ^{[1
]}

Chen, Qiang ^{[2
]}

Wang, Jian ^{[2
]}

Wang, Jingdong ^{[2
]}

Li, Zechao ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

[2] Baidu, Visual Technol Dept, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2025年 / 34卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Adaptation models; Computational modeling; Predictive models; Computer vision; Cognition; Semantics; Prompt engineering; Context modeling; Training; Visual in-context learning; large-scale vision model; in-context learning; prompt selection; prompt fusion; SHOT; NETWORK;

D O I：

10.1109/TIP.2025.3554410

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.

引用

页码：2147 / 2160

页数：14

共 50 条

[41] The Transient Nature of Emergent In-Context Learning in Transformers
Singh, Aaditya K.
Chan, Stephanie C. Y.
Moskovitz, Ted
Grant, Erin
Saxe, Andrew M.
Hill, Felix
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[42] PRODIGY: Enabling In-context Learning Over Graphs
Huang, Qian
Ren, Hongyu
Chen, Peng
Krzmanc, Gregor
Zeng, Daniel
Liang, Percy
Leskovec, Jure
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[43] Using In-Context Learning to Improve Dialogue Safety
Meade, Nicholas
Gella, Spandana
Hazarika, Devamanyu
Gupta, Prakhar
Jin, Di
Reddy, Siva
Liu, Yang
Hakkani-Tur, Dilek
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11882 - 11910
[44] On the Relation between Sensitivity and Accuracy in In-Context Learning
Chen, Yanda
Zhao, Chen
Yu, Zhou
McKeown, Kathleen
He, He
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 155 - 167
[45] PlugMed: Improving Specificity in Patient-Centered Medical Dialogue Generation using In-Context Learning
Dou, Chengfeng
Jin, Zhi
Jiao, Wenpin
Zhao, Haiyan
Zhao, Yongqiang
Tao, Zhenwei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5050 - 5066
[46] Active Learning Principles for In-Context Learning with Large Language Models
Margatina, Katerina
Schick, Timo
Aletras, Nikolaos
Dwivedi-Yu, Jane
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5011 - 5034
[47] Schema-learning and rebinding as mechanisms of in-context learning and emergence
Swaminathan, Sivaramakrishnan
Dedieu, Antoine
Raju, Rajkumar Vasudeva
Shanahan, Murray
Lazaro-Gredilla, Miguel
George, Dileep
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[48] Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
Wang, Xinshun
Fang, Zhongbin
Li, Xia
Li, Xiangtai
Chen, Chen
Liu, Mengyuan
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2436 - 2446
[49] Symbol tuning improves in-context learning in language models
Wei, Jerry
Hou, Le
Lampinen, Andrew
Chen, Xiangning
Huang, Da
Tay, Yi
Chen, Xinyun
Lu, Yifeng
Zhou, Denny
Ma, Tengyu
Le, Quoc V.
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 968 - 979
[50] Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs
Liang, Xiao
Wang, Di
Zhong, Haodi
Wang, Quan
Li, Ronghan
Jia, Rui
Wan, Bo
INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (05)

← 1 2 3 4 5 →