Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

被引:0
|
作者
Wang, Yubin [1 ]
Jiang, Xinyang [2 ]
Cheng, De [3 ]
Li, Dongsheng [2 ]
Zhao, Cairong [1 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Xidian Univ, Xian, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information that effectively represents the interconnections among entities or attributes linked to a particular category. To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations. Preexisting prompt tuning methods exhibit inadequacies in managing this structured knowledge. Consequently, we propose a novel approach called Hierarchical Prompt Tuning (HPT), which enables simultaneous modeling of both structured and conventional linguistic knowledge. Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning. In addition, by incorporating high-level and global-level prompts modeling overall semantics, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships. Extensive experiments demonstrate that our HPT shows strong effectiveness and generalizes much better than existing SOTA methods. Our code is available at https://github.com/Vill-Lab/2024-AAAI-HPT.
引用
收藏
页码:5749 / 5757
页数:9
相关论文
共 50 条
  • [31] Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
    Zhu, Beier
    Niu, Yulei
    Lee, Saeil
    Hur, Minhoe
    Zhang, Hanwang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3834 - 3842
  • [32] SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models
    Ma, Xiaosong
    Zhang, Jie
    Guo, Song
    Xu, Wenchao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
    Jin, Woojeong
    Cheng, Yu
    Shen, Yelong
    Chen, Weizhu
    Ren, Xiang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2763 - 2775
  • [34] Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
    Wang, Fei
    Ding, Liang
    Rao, Jun
    Liu, Ye
    Shen, Li
    Ding, Changxing
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (12)
  • [35] Active Prompt Learning in Vision Language Models
    Bang, Jihwan
    Ahn, Sumyeong
    Lee, Jae-Gil
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26994 - 27004
  • [36] UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models
    Li, Xin
    Behpour, Sima
    Doan, Thang
    He, Wenbin
    Gou, Liang
    Ren, Liu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Multi-task Learning of Hierarchical Vision-Language Representation
    Duy-Kien Nguyen
    Okatani, Takayuki
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10484 - 10493
  • [38] Learning with Enriched Inductive Biases for Vision-Language Models
    Yang, Lingxiao
    Zhang, Ru-Yuan
    Chen, Qi
    Xie, Xiaohua
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [39] The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models
    Wu, Chenwei
    Li, Li Erran
    Ermon, Stefano
    Haffner, Patrick
    Ge, Rong
    Zhang, Zaiwei
    PROCEEDINGS ON I CAN'T BELIEVE IT'S NOT BETTER: FAILURE MODES IN THE AGE OF FOUNDATION MODELS AT NEURIPS 2023 WORKSHOPS, 2023, 239 : 118 - 126
  • [40] CTPT: Continual Test-time Prompt Tuning for vision-language models
    Wang, Fan
    Han, Zhongyi
    Liu, Xingbo
    Yin, Yilong
    Gao, Xin
    PATTERN RECOGNITION, 2025, 161