DPO: Discrete Prompt Optimization for Vision-Language Models

被引:0
|
作者
Liang, Nanhao [1 ,2 ]
Liu, Yong [1 ]
机构
[1] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei 230026, Peoples R China
基金
国家重点研发计划;
关键词
Training; Optimization; Adaptation models; Visualization; Overfitting; Vectors; Vocabulary; Signal processing algorithms; Stochastic processes; Standards; Prompt learning; vision-language model;
D O I
10.1109/LSP.2025.3528362
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, the emergence of large vision-language models (VLMs) has catalyzed the development of prompt learning, where networks are trained to enhance VLM performance by learning continuous prompts. However, traditional continuous prompt learning often struggles with challenges like overfitting to Base classes and a lack of interpretability due to the nature of prompt parameterization. To overcome these limitations, we introduce Discrete Prompt Optimization (DPO), a method that optimizes text prompts in discrete word-space. During training, scores are assigned to token embeddings, which are then used to select the most effective token sequence for the downstream task. DPO was tested across 11 diverse datasets, consistently outperforming baseline methods like CLIP and CoOp on Novel classes in most cases. This discrete approach not only reduces overfitting but also enhances transparency and model interpretability, enabling the learning of dataset-specific text prompts that are easily understandable.
引用
收藏
页码:671 / 675
页数:5
相关论文
共 50 条
  • [31] Modal Interaction-Enhanced Prompt Learning by Transformer Decoder for Vision-Language Models
    Liu, Mingyue
    Zhao, Honggang
    Ma, Longfei
    Li, Xiang
    Ji, Yucheng
    Li, Mingyong
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2023, 2023, 14120 : 163 - 174
  • [32] Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition
    Sun, Hongbo
    He, Xiangteng
    Zhou, Jiahuan
    Peng, Yuxin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5828 - 5836
  • [33] A survey of efficient fine-tuning methods for Vision-Language Models - Prompt and Adapter
    Xing, Jialu
    Liu, Jianping
    Wang, Jian
    Sun, Lulu
    Chen, Xi
    Gu, Xunxun
    Wang, Yingfei
    COMPUTERS & GRAPHICS-UK, 2024, 119
  • [34] Modal interaction-enhanced prompt learning by transformer decoder for vision-language models
    Mingyue Liu
    Honggang Zhao
    Longfei Ma
    Mingyong Li
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [35] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
  • [36] Modal interaction-enhanced prompt learning by transformer decoder for vision-language models
    Liu, Mingyue
    Zhao, Honggang
    Ma, Longfei
    Li, Mingyong
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
  • [37] Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
    Li, Juncheng
    Gao, Minghe
    Wei, Longhui
    Tang, Siliang
    Zhang, Wenqiao
    Li, Mengze
    Ji, Wei
    Tian, Qi
    Chua, Tat-Seng
    Zhuang, Yueting
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2551 - 2562
  • [38] MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion
    Fan, Hao
    Ma, Zhaoyang
    Li, Yong
    Tian, Rui
    Chen, Yunli
    Gao, Chenlong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024, 2024, 14870 : 328 - 339
  • [39] Fine-grained multi-modal prompt learning for vision-language models
    Liu, Yunfei
    Deng, Yunziwei
    Liu, Anqi
    Liu, Yanan
    Li, Shengyang
    NEUROCOMPUTING, 2025, 636
  • [40] Multi-task prompt tuning with soft context sharing for vision-language models
    Ding, Kun
    Wang, Ying
    Liu, Pengzhang
    Yu, Qiang
    Zhang, Haojian
    Xiang, Shiming
    Pan, Chunhong
    NEUROCOMPUTING, 2024, 603