DPO: Discrete Prompt Optimization for Vision-Language Models

被引:0
|
作者
Liang, Nanhao [1 ,2 ]
Liu, Yong [1 ]
机构
[1] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei 230026, Peoples R China
基金
国家重点研发计划;
关键词
Training; Optimization; Adaptation models; Visualization; Overfitting; Vectors; Vocabulary; Signal processing algorithms; Stochastic processes; Standards; Prompt learning; vision-language model;
D O I
10.1109/LSP.2025.3528362
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, the emergence of large vision-language models (VLMs) has catalyzed the development of prompt learning, where networks are trained to enhance VLM performance by learning continuous prompts. However, traditional continuous prompt learning often struggles with challenges like overfitting to Base classes and a lack of interpretability due to the nature of prompt parameterization. To overcome these limitations, we introduce Discrete Prompt Optimization (DPO), a method that optimizes text prompts in discrete word-space. During training, scores are assigned to token embeddings, which are then used to select the most effective token sequence for the downstream task. DPO was tested across 11 diverse datasets, consistently outperforming baseline methods like CLIP and CoOp on Novel classes in most cases. This discrete approach not only reduces overfitting but also enhances transparency and model interpretability, enabling the learning of dataset-specific text prompts that are easily understandable.
引用
收藏
页码:671 / 675
页数:5
相关论文
共 50 条
  • [21] PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning
    Hussein, Noor
    Shamshad, Fahad
    Naseer, Muzammal
    Nandakumar, Karthik
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 698 - 708
  • [22] CTPT: Continual Test-time Prompt Tuning for vision-language models
    Wang, Fan
    Han, Zhongyi
    Liu, Xingbo
    Yin, Yilong
    Gao, Xin
    PATTERN RECOGNITION, 2025, 161
  • [23] UMPA: Unified multi-modal prompt with adapter for vision-language models
    Jin, Zhengwei
    Wei, Yun
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [24] CPT: Colorful Prompt Tuning for pre-trained vision-language models
    Yao, Yuan
    Zhang, Ao
    Zhang, Zhengyan
    Liu, Zhiyuan
    Chua, Tat-Seng
    Sun, Maosong
    AI OPEN, 2024, 5 : 30 - 38
  • [25] Prompt-guided and multimodal landscape scenicness assessments with vision-language models
    Levering, Alex
    Marcos, Diego
    Jacobs, Nathan
    Tuia, Devis
    PLOS ONE, 2024, 19 (09):
  • [26] Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices
    Cai, Siqi
    Liu, Xuan
    Yuan, Jingling
    Zhou, Qihua
    PATTERN RECOGNITION, 2025, 163
  • [27] Cascade Prompt Learning for Vision-Language Model Adaptation
    Wu, Ge
    Zhang, Xin
    Li, Zheng
    Chen, Zhaowei
    Liang, Jiajun
    Yang, Jian
    Li, Xiang
    COMPUTER VISION - ECCV 2024, PT L, 2025, 15108 : 304 - 321
  • [28] CoPL: Contextual Prompt Learning for Vision-Language Understanding
    Goswami, Koustava
    Karanam, Srikrishna
    Udhayanan, Prateksha
    Joseph, K. J.
    Srinivasan, Balaji Vasan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18090 - 18098
  • [29] Vision-Language Tracking With CLIP and Interactive Prompt Learning
    Zhu, Hong
    Lu, Qingyang
    Xue, Lei
    Zhang, Pingping
    Yuan, Guanglin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) : 3659 - 3670
  • [30] Read-only Prompt Optimization for Vision-Language Few-shot Learning
    Lee, Dongjun
    Song, Seokwon
    Suh, Jihee
    Choi, Joonmyeong
    Lee, Sanghyeok
    Kim, Hyunwoo J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1401 - 1411