Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

被引：0

作者：

Wang, Yubin ^{[1
]}

Jiang, Xinyang ^{[2
]}

Cheng, De ^{[3
]}

Li, Dongsheng ^{[2
]}

Zhao, Cairong ^{[1
]}

机构：

[1] Tongji Univ, Shanghai, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

[3] Xidian Univ, Xian, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information that effectively represents the interconnections among entities or attributes linked to a particular category. To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations. Preexisting prompt tuning methods exhibit inadequacies in managing this structured knowledge. Consequently, we propose a novel approach called Hierarchical Prompt Tuning (HPT), which enables simultaneous modeling of both structured and conventional linguistic knowledge. Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning. In addition, by incorporating high-level and global-level prompts modeling overall semantics, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships. Extensive experiments demonstrate that our HPT shows strong effectiveness and generalizes much better than existing SOTA methods. Our code is available at https://github.com/Vill-Lab/2024-AAAI-HPT.

引用

页码：5749 / 5757

页数：9

共 50 条

[41] UMPA: Unified multi-modal prompt with adapter for vision-language models
Jin, Zhengwei
Wei, Yun
MULTIMEDIA SYSTEMS, 2025, 31 (02)
[42] CPT: Colorful Prompt Tuning for pre-trained vision-language models
Yao, Yuan
Zhang, Ao
Zhang, Zhengyan
Liu, Zhiyuan
Chua, Tat-Seng
Sun, Maosong
AI OPEN, 2024, 5 : 30 - 38
[43] Prompt-guided and multimodal landscape scenicness assessments with vision-language models
Levering, Alex
Marcos, Diego
Jacobs, Nathan
Tuia, Devis
PLOS ONE, 2024, 19 (09):
[44] GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Li, Xin
Lian, Dongze
Lu, Zhihe
Bai, Jiawang
Chen, Zhibo
Wang, Xinchao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[45] Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
Du, Yu
Wei, Fangyun
Zhang, Zihe
Shi, Miaojing
Gao, Yue
Li, Guoqi
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14064 - 14073
[46] Prompt-Ladder: Memory-efficient prompt tuning for vision-language models on edge devices
Cai, Siqi
Liu, Xuan
Yuan, Jingling
Zhou, Qihua
PATTERN RECOGNITION, 2025, 163
[47] Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
Chen, Yeming
Zhang, Siyu
Sun, Yaoru
Yang, Jun
Liang, Weijian
Wang, Haoran
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2768 - 2781
[48] A survey of efficient fine-tuning methods for Vision-Language Models - Prompt and Adapter
Xing, Jialu
Liu, Jianping
Wang, Jian
Sun, Lulu
Chen, Xi
Gu, Xunxun
Wang, Yingfei
COMPUTERS & GRAPHICS-UK, 2024, 119
[49] Vision-Language Models for Vision Tasks: A Survey
Zhang, Jingyi
Huang, Jiaxing
Jin, Sheng
Lu, Shijian
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
[50] GalLoP: Learning Global and Local Prompts for Vision-Language Models
Lafon, Marc
Ramzi, Elias
Rambour, Clement
Audebert, Nicolas
Thome, Nicolas
COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 264 - 282

← 1 2 3 4 5 →