Exploring Effective Factors for Improving Visual In-Context Learning

被引:0
|
作者
Sun, Yanpeng [1 ]
Chen, Qiang [2 ]
Wang, Jian [2 ]
Wang, Jingdong [2 ]
Li, Zechao [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Baidu, Visual Technol Dept, Beijing 100085, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Adaptation models; Computational modeling; Predictive models; Computer vision; Cognition; Semantics; Prompt engineering; Context modeling; Training; Visual in-context learning; large-scale vision model; in-context learning; prompt selection; prompt fusion; SHOT; NETWORK;
D O I
10.1109/TIP.2025.3554410
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.
引用
收藏
页码:2147 / 2160
页数:14
相关论文
共 50 条
  • [41] The Transient Nature of Emergent In-Context Learning in Transformers
    Singh, Aaditya K.
    Chan, Stephanie C. Y.
    Moskovitz, Ted
    Grant, Erin
    Saxe, Andrew M.
    Hill, Felix
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] PRODIGY: Enabling In-context Learning Over Graphs
    Huang, Qian
    Ren, Hongyu
    Chen, Peng
    Krzmanc, Gregor
    Zeng, Daniel
    Liang, Percy
    Leskovec, Jure
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [43] Using In-Context Learning to Improve Dialogue Safety
    Meade, Nicholas
    Gella, Spandana
    Hazarika, Devamanyu
    Gupta, Prakhar
    Jin, Di
    Reddy, Siva
    Liu, Yang
    Hakkani-Tur, Dilek
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11882 - 11910
  • [44] On the Relation between Sensitivity and Accuracy in In-Context Learning
    Chen, Yanda
    Zhao, Chen
    Yu, Zhou
    McKeown, Kathleen
    He, He
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 155 - 167
  • [45] PlugMed: Improving Specificity in Patient-Centered Medical Dialogue Generation using In-Context Learning
    Dou, Chengfeng
    Jin, Zhi
    Jiao, Wenpin
    Zhao, Haiyan
    Zhao, Yongqiang
    Tao, Zhenwei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5050 - 5066
  • [46] Active Learning Principles for In-Context Learning with Large Language Models
    Margatina, Katerina
    Schick, Timo
    Aletras, Nikolaos
    Dwivedi-Yu, Jane
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5011 - 5034
  • [47] Schema-learning and rebinding as mechanisms of in-context learning and emergence
    Swaminathan, Sivaramakrishnan
    Dedieu, Antoine
    Raju, Rajkumar Vasudeva
    Shanahan, Murray
    Lazaro-Gredilla, Miguel
    George, Dileep
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
    Wang, Xinshun
    Fang, Zhongbin
    Li, Xia
    Li, Xiangtai
    Chen, Chen
    Liu, Mengyuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2436 - 2446
  • [49] Symbol tuning improves in-context learning in language models
    Wei, Jerry
    Hou, Le
    Lampinen, Andrew
    Chen, Xiangning
    Huang, Da
    Tay, Yi
    Chen, Xinyun
    Lu, Yifeng
    Zhou, Denny
    Ma, Tengyu
    Le, Quoc V.
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 968 - 979
  • [50] Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs
    Liang, Xiao
    Wang, Di
    Zhong, Haodi
    Wang, Quan
    Li, Ronghan
    Jia, Rui
    Wan, Bo
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (05)