DPO: Discrete Prompt Optimization for Vision-Language Models

被引:0
|
作者
Liang, Nanhao [1 ,2 ]
Liu, Yong [1 ]
机构
[1] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei 230026, Peoples R China
基金
国家重点研发计划;
关键词
Training; Optimization; Adaptation models; Visualization; Overfitting; Vectors; Vocabulary; Signal processing algorithms; Stochastic processes; Standards; Prompt learning; vision-language model;
D O I
10.1109/LSP.2025.3528362
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, the emergence of large vision-language models (VLMs) has catalyzed the development of prompt learning, where networks are trained to enhance VLM performance by learning continuous prompts. However, traditional continuous prompt learning often struggles with challenges like overfitting to Base classes and a lack of interpretability due to the nature of prompt parameterization. To overcome these limitations, we introduce Discrete Prompt Optimization (DPO), a method that optimizes text prompts in discrete word-space. During training, scores are assigned to token embeddings, which are then used to select the most effective token sequence for the downstream task. DPO was tested across 11 diverse datasets, consistently outperforming baseline methods like CLIP and CoOp on Novel classes in most cases. This discrete approach not only reduces overfitting but also enhances transparency and model interpretability, enabling the learning of dataset-specific text prompts that are easily understandable.
引用
收藏
页码:671 / 675
页数:5
相关论文
共 50 条
  • [41] CSP-DCPE: Category-Specific Prompt with Deep Contextual Prompt Enhancement for Vision-Language Models
    Wu, Chunlei
    Wu, Yixiang
    Xu, Qinfu
    Zi, Xuebin
    ELECTRONICS, 2025, 14 (04):
  • [42] A Slim Prompt-Averaged Consistency prompt learning for vision-language model
    He, Siyu
    Wang, Shengsheng
    Long, Sifan
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [43] Vision-Language Models for Biomedical Applications
    Thapa, Surendrabikram
    Naseem, Usman
    Zhou, Luping
    Kim, Jinman
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON VISION-LANGUAGE MODELS FOR BIOMEDICAL APPLICATIONS, VLM4BIO 2024, 2024, : 1 - 2
  • [44] The Neglected Tails in Vision-Language Models
    Parashar, Shubham
    Lin, Zhiqiu
    Liu, Tian
    Dong, Xiangjue
    Li, Yanan
    Ramanan, Deva
    Caverlee, James
    Kong, Shu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 12988 - 12997
  • [45] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
    Du, Yuqing
    Konyushkova, Ksenia
    Denil, Misha
    Raju, Akhil
    Landon, Jessica
    Hill, Felix
    de Freitas, Nando
    Cabi, Serkan
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136
  • [46] Pre-training A Prompt Pool for Vision-Language Model
    Liu, Jun
    Gu, Yang
    Yang, Zhaohua
    Guo, Shuai
    Liu, Huaqiu
    Chen, Yiqiang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [47] A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
    Jin, Woojeong
    Cheng, Yu
    Shen, Yelong
    Chen, Weizhu
    Ren, Xiang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2763 - 2775
  • [48] LAPT: Label-Driven Automated Prompt Tuning for OOD Detection with Vision-Language Models
    Zhang, Yabin
    Zhu, Wenjie
    He, Chenhang
    Zhang, Lei
    COMPUTER VISION - ECCV 2024, PT LXXII, 2025, 15130 : 271 - 288
  • [49] Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning
    Gao, Zhengqing
    Ao, Xiang
    Zhang, Xu-Yao
    Liu, Cheng-Lin
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 439 - 452
  • [50] Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
    Xuan, Yunyi
    Chen, Weijie
    Yang, Shicai
    Xie, Di
    Lin, Luojun
    Zhuang, Yueting
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4928 - 4938