Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

被引:0
|
作者
Wang, Yubin [1 ]
Jiang, Xinyang [2 ]
Cheng, De [3 ]
Li, Dongsheng [2 ]
Zhao, Cairong [1 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Xidian Univ, Xian, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information that effectively represents the interconnections among entities or attributes linked to a particular category. To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations. Preexisting prompt tuning methods exhibit inadequacies in managing this structured knowledge. Consequently, we propose a novel approach called Hierarchical Prompt Tuning (HPT), which enables simultaneous modeling of both structured and conventional linguistic knowledge. Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning. In addition, by incorporating high-level and global-level prompts modeling overall semantics, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships. Extensive experiments demonstrate that our HPT shows strong effectiveness and generalizes much better than existing SOTA methods. Our code is available at https://github.com/Vill-Lab/2024-AAAI-HPT.
引用
收藏
页码:5749 / 5757
页数:9
相关论文
共 50 条
  • [21] Modal interaction-enhanced prompt learning by transformer decoder for vision-language models
    Liu, Mingyue
    Zhao, Honggang
    Ma, Longfei
    Li, Mingyong
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
  • [22] Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
    Li, Juncheng
    Gao, Minghe
    Wei, Longhui
    Tang, Siliang
    Zhang, Wenqiao
    Li, Mengze
    Ji, Wei
    Tian, Qi
    Chua, Tat-Seng
    Zhuang, Yueting
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2551 - 2562
  • [23] Fine-grained multi-modal prompt learning for vision-language models
    Liu, Yunfei
    Deng, Yunziwei
    Liu, Anqi
    Liu, Yanan
    Li, Shengyang
    NEUROCOMPUTING, 2025, 636
  • [24] Distribution-Aware Prompt Tuning for Vision-Language Models
    Cho, Eulrang
    Kim, Jooyeon
    Kim, Hyunwoo J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21947 - 21956
  • [25] Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
    Ma, Chengcheng
    Liu, Yang
    Deng, Jiankang
    Xie, Lingxi
    Dong, Weiming
    Xu, Changsheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4616 - 4629
  • [26] A Slim Prompt-Averaged Consistency prompt learning for vision-language model
    He, Siyu
    Wang, Shengsheng
    Long, Sifan
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [27] Conceptual Codebook Learning for Vision-Language Models
    Zhang, Yi
    Yu, Ke
    Wu, Siqi
    He, Zhihai
    COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 235 - 251
  • [28] Exploring Vision-Language Models for Imbalanced Learning
    Wang Y.
    Yu Z.
    Wang J.
    Heng Q.
    Chen H.
    Ye W.
    Xie R.
    Xie X.
    Zhang S.
    International Journal of Computer Vision, 2024, 132 (01) : 224 - 237
  • [29] TriMPL: Masked Multi-Prompt Learning with Knowledge Mixing for Vision-Language Few-shot Learning
    Liu, Xiangyu
    Shang, Yanlei
    Chen, Yong
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 552 - 560
  • [30] Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
    Wu, Cheng-En
    Tian, Yu
    Yu, Haichao
    Wang, Heng
    Morgado, Pedro
    Hu, Yu Hen
    Yang, Linjie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15442 - 15451