DPO: Discrete Prompt Optimization for Vision-Language Models

被引：0

作者：

Liang, Nanhao ^{[1
,2
]}

Liu, Yong ^{[1
]}

机构：

[1] Chinese Acad Sci, Hefei Inst Phys Sci, Hefei 230031, Peoples R China

[2] Univ Sci & Technol China, Hefei 230026, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2025年 / 32卷

基金：

国家重点研发计划;

关键词：

Training; Optimization; Adaptation models; Visualization; Overfitting; Vectors; Vocabulary; Signal processing algorithms; Stochastic processes; Standards; Prompt learning; vision-language model;

D O I：

10.1109/LSP.2025.3528362

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In recent years, the emergence of large vision-language models (VLMs) has catalyzed the development of prompt learning, where networks are trained to enhance VLM performance by learning continuous prompts. However, traditional continuous prompt learning often struggles with challenges like overfitting to Base classes and a lack of interpretability due to the nature of prompt parameterization. To overcome these limitations, we introduce Discrete Prompt Optimization (DPO), a method that optimizes text prompts in discrete word-space. During training, scores are assigned to token embeddings, which are then used to select the most effective token sequence for the downstream task. DPO was tested across 11 diverse datasets, consistently outperforming baseline methods like CLIP and CoOp on Novel classes in most cases. This discrete approach not only reduces overfitting but also enhances transparency and model interpretability, enabling the learning of dataset-specific text prompts that are easily understandable.

引用

页码：671 / 675

页数：5

共 50 条

[1] Learning to Prompt for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
[2] Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
International Journal of Computer Vision, 2022, 130 : 2337 - 2348
[3] Conditional Prompt Learning for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16795 - 16804
[4] Consistent prompt learning for vision-language models
Zhang, Yonggang
Tian, Xinmei
KNOWLEDGE-BASED SYSTEMS, 2025, 310
[5] Adversarial Prompt Tuning for Vision-Language Models
Zhang, Jiaming
Ma, Xingjun
Wang, Xin
Qiu, Lingyu
Wang, Jiaqi
Jiang, Yu-Gang
Sang, Jitao
COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
[6] Learning Domain Invariant Prompt for Vision-Language Models
Zhao, Cairong
Wang, Yubin
Jiang, Xinyang
Shen, Yifei
Song, Kaitao
Li, Dongsheng
Miao, Duoqian
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1348 - 1360
[7] JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
Guo, Yuncheng
Guo, Xiaodong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 28695 - 28705
[8] Distribution-Aware Prompt Tuning for Vision-Language Models
Cho, Eulrang
Kim, Jooyeon
Kim, Hyunwoo J.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21947 - 21956
[9] Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
Ma, Chengcheng
Liu, Yang
Deng, Jiankang
Xie, Lingxi
Dong, Weiming
Xu, Changsheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4616 - 4629
[10] Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models
Wang, Yubin
Jiang, Xinyang
Cheng, De
Li, Dongsheng
Zhao, Cairong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5749 - 5757

← 1 2 3 4 5 →