Learning to Learn Better Visual Prompts

被引:0
|
作者
Wang, Fengxiang [1 ]
Huang, Wanrong [1 ]
Yang, Shaowu [1 ]
Qi, Fan [2 ]
Lan, Long [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp Sci & Technol, HPCL, Changsha, Hunan, Peoples R China
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt tuning provides a low-cost way of adapting vision-language models (VLMs) for various downstream vision tasks without requiring updating the huge pre-trained parameters. Dispensing with the conventional manual crafting of prompts, the recent prompt tuning method of Context Optimization (CoOp) introduces adaptable vectors as text prompts. Nevertheless, several previous works point out that the CoOp-based approaches are easy to overfit to the base classes and hard to generalize to novel classes. In this paper, we reckon that the prompt tuning works well only in the base classes because of the limited capacity of the adaptable vectors. In addition, the scale of the pre-trained model is a hundred times the scale of the adaptable vector, thus the learned vector has a very limited ability to absorb the knowledge of novel classes. To minimize this excessive overfitting of textual knowledge on the base class, we view prompt tuning as learning to learn (LoL) and learn the prompt in the way of meta-learning, the training manner of dividing the base classes into many different subclasses could fully exert the limited capacity of prompt tuning and thus transfer its power to recognize the novel classes. To be specific, we initially perform fine-tuning on the base class based on the CoOp method for pre-trained CLIP. Subsequently, predicated on the fine-tuned CLIP model, we carry out further fine-tuning in an N-way K-shot manner from the perspective of meta-learning on the base classes. We finally apply the learned textual vector and VLM for unseen classes. Extensive experiments on benchmark datasets validate the efficacy of our meta-learning-informed prompt tuning, affirming its role as a robust optimization strategy for VLMs.
引用
收藏
页码:5354 / 5363
页数:10
相关论文
共 50 条
  • [1] A Systematic Literature Review: Learning with Visual by The Help of Augmented Reality Helps Students Learn Better
    Liono, Rishka A.
    Amanda, Nadiran
    Pratiwi, Anisah
    Gunawan, Alexander A. S.
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 144 - 152
  • [2] SPT: Learning to Selectively Insert Prompts for Better Prompt Tuning
    Zhu, Wei
    Tan, Ming
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11862 - 11878
  • [3] Learning to Learn Better for Video Object Segmentation
    Lan, Meng
    Zhang, Jing
    Zhang, Lefei
    Tao, Dacheng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1205 - 1212
  • [4] Students Learn English Better ... Learning to Teach It!
    Noel, Barbara
    GIST-EDUCATION AND LEARNING RESEARCH JOURNAL, 2008, (02): : 102 - 118
  • [5] Learning to Learn Image Classifiers with Visual Analogy
    Zhou, Linjun
    Cui, Peng
    Yang, Shiqiang
    Zhu, Wenwu
    Tian, Qi
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11489 - 11498
  • [6] Efficient Transfer Learning for Visual Tasks via Continuous Optimization of Prompts
    Conder, Jonathan
    Jefferson, Josephine
    Pages, Nathan
    Jawed, Khurram
    Nejati, Alireza
    Sagar, Mark
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 297 - 309
  • [7] RobustPrompt: Learning to defend against adversarial attacks with adaptive visual prompts
    Liu, Chang
    Xiang, Wenzhao
    Dong, Yinpeng
    Zhang, Xingxing
    Wang, Liyuan
    Duan, Ranjie
    Zheng, Shibao
    Su, Hang
    PATTERN RECOGNITION LETTERS, 2025, 190 : 161 - 168
  • [8] Play to learn, Learn to play. Creating better opportunities for learning in early childhood
    Ciolan, Laura Elena
    5TH INTERNATIONAL CONFERENCE EDU-WORLD 2012 - EDUCATION FACING CONTEMPORARY WORLD ISSUES, 2013, 76 : 186 - 189
  • [9] Learning to Learn How to Learn: Self-Adaptive Visual Navigation using Meta-Learning
    Wortsman, Mitchell
    Ehsani, Kiana
    Rastegari, Mohammad
    Farhadi, Ali
    Mottaghi, Roozbeh
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3743 - 6752
  • [10] Computations, circuits and biophysics in visual cortex: Learning to learn
    Poggio, T.
    PERCEPTION, 2011, 40 : 3 - 3