Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

被引：0

作者：

Wang, Yubin ^{[1
]}

Jiang, Xinyang ^{[2
]}

Cheng, De ^{[3
]}

Li, Dongsheng ^{[2
]}

Zhao, Cairong ^{[1
]}

机构：

[1] Tongji Univ, Shanghai, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

[3] Xidian Univ, Xian, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information that effectively represents the interconnections among entities or attributes linked to a particular category. To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations. Preexisting prompt tuning methods exhibit inadequacies in managing this structured knowledge. Consequently, we propose a novel approach called Hierarchical Prompt Tuning (HPT), which enables simultaneous modeling of both structured and conventional linguistic knowledge. Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning. In addition, by incorporating high-level and global-level prompts modeling overall semantics, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships. Extensive experiments demonstrate that our HPT shows strong effectiveness and generalizes much better than existing SOTA methods. Our code is available at https://github.com/Vill-Lab/2024-AAAI-HPT.

引用

页码：5749 / 5757

页数：9

共 50 条

[1] Learning to Prompt for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
[2] Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
International Journal of Computer Vision, 2022, 130 : 2337 - 2348
[3] Conditional Prompt Learning for Vision-Language Models
Zhou, Kaiyang
Yang, Jingkang
Loy, Chen Change
Liu, Ziwei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16795 - 16804
[4] Consistent prompt learning for vision-language models
Zhang, Yonggang
Tian, Xinmei
KNOWLEDGE-BASED SYSTEMS, 2025, 310
[5] Learning Domain Invariant Prompt for Vision-Language Models
Zhao, Cairong
Wang, Yubin
Jiang, Xinyang
Shen, Yifei
Song, Kaitao
Li, Dongsheng
Miao, Duoqian
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1348 - 1360
[6] JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
Guo, Yuncheng
Guo, Xiaodong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 28695 - 28705
[7] Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Kan, Baoshuo
Wang, Teng
Lu, Wenpeng
Zhen, Xiantong
Guan, Weili
Zheng, Feng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15624 - 15634
[8] Adversarial Prompt Tuning for Vision-Language Models
Zhang, Jiaming
Ma, Xingjun
Wang, Xin
Qiu, Lingyu
Wang, Jiaqi
Jiang, Yu-Gang
Sang, Jitao
COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
[9] Concept-Guided Prompt Learning for Generalization in Vision-Language Models
Zhang, Yi
Zhang, Ce
Yu, Ke
Tang, Yushun
He, Zhihai
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7377 - 7386
[10] Learning to Prompt for Vision-Language Emotion Recognition
Xie, Hongxia
Chung, Hua
Shuai, Hong-Han
Cheng, Wen-Huang
2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,

← 1 2 3 4 5 →