Exploring Effective Factors for Improving Visual In-Context Learning

被引:0
|
作者
Sun, Yanpeng [1 ]
Chen, Qiang [2 ]
Wang, Jian [2 ]
Wang, Jingdong [2 ]
Li, Zechao [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Baidu, Visual Technol Dept, Beijing 100085, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Adaptation models; Computational modeling; Predictive models; Computer vision; Cognition; Semantics; Prompt engineering; Context modeling; Training; Visual in-context learning; large-scale vision model; in-context learning; prompt selection; prompt fusion; SHOT; NETWORK;
D O I
10.1109/TIP.2025.3554410
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.
引用
收藏
页码:2147 / 2160
页数:14
相关论文
共 50 条
  • [31] In-Context Learning Unlocked for Diffusion Models
    Wang, Zhendong
    Jiang, Yifan
    Lu, Yadong
    Shen, Yelong
    He, Pengcheng
    Chen, Weizhu
    Wang, Zhangyang
    Zhou, Mingyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [32] Finding Support Examples for In-Context Learning
    Li, Xiaonan
    Qiu, Xipeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6219 - 6235
  • [33] Mitigating Label Biases for In-context Learning
    Fei, Yu
    Hou, Yifan
    Chen, Zeming
    Bosselut, Antoine
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14014 - 14031
  • [34] IN-CONTEXT LANGUAGE LEARNING: ARCHITECTURES AND ALGORITHMS
    Akyürek, Ekin
    Wang, Bailin
    Kim, Yoon
    Andreas, Jacob
    arXiv,
  • [35] Guideline Learning for In-Context Information Extraction
    Pang, Chaoxu
    Cao, Yixuan
    Ding, Qiang
    Luo, Ping
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 15372 - 15389
  • [36] In-Context Learning Creates Task Vectors
    Hendel, Roee
    Geva, Mor
    Globerson, Amir
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9318 - 9333
  • [37] Improving In-Context Few-Shot Learning via Self-Supervised Training
    Chen, Mingda
    Du, Jingfei
    Pasunuru, Ramakanth
    Mihaylov, Todor
    Iyer, Srini
    Stoyanov, Veselin
    Kozareva, Zornitsa
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3558 - 3573
  • [38] Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering
    Wu, Zhiyong
    Wang, Yaoxiang
    Ye, Jiacheng
    Kong, Lingpeng
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1423 - 1436
  • [39] Prompt Optimization via Adversarial In-Context Learning
    Do, Xuan Long
    Zhao, Yiran
    Brown, Hannah
    Xie, Yuxi
    Zhao, James Xu
    Chen, Nancy F.
    Kawaguchi, Kenji
    Shieh, Michael
    He, Junxian
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7308 - 7327
  • [40] Robustness of Named Entity Replacements for In-Context Learning
    Goodarzi, Saeed
    Kagita, Nikhil
    Minn, Dennis
    Wang, Shufan
    Dessi, Roberto
    Toshniwal, Shubham
    Williams, Adina
    Lanchantin, Jack
    Sinha, Koustuv
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10914 - 10931