Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

被引:0
|
作者
Zhang, Yichi [1 ,2 ]
Dong, Yinpeng [1 ,2 ]
Zhang, Siyuan [1 ]
Min, Tianzan [1 ]
Su, Hang [1 ,3 ]
Zhu, Jun [1 ,2 ,3 ]
机构
[1] Tsinghua Univ, Tsinghua Bosch Joint ML Ctr, Dept Comp Sci & Tech, Inst AI,THBI Lab,BNRist Ctr, Beijing 100084, Peoples R China
[2] RealAI, Beijing, Peoples R China
[3] Pazhou Lab Huangpu, Guangzhou, Guangdong, Peoples R China
关键词
D O I
10.1109/CVPR52733.2024.02508
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although Multimodal Large Language Models (MLLMs) have demonstrated promising versatile capabilities, their performance is still inferior to specialized models on downstream tasks, which makes adaptation necessary to enhance their utility. However, fine-tuning methods require independent training for every model, leading to huge computation and memory overheads. In this paper, we propose a novel setting where we aim to improve the performance of diverse MLLMs with a group of shared parameters optimized for a downstream task. To achieve this, we propose Transferable Visual Prompting (TVP), a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model. We introduce two strategies to address the issue of cross-model feature corruption of existing visual prompting methods and enhance the transferability of the learned prompts, including 1) Feature Consistency Alignment: which imposes constraints to the prompted feature changes to maintain task-agnostic knowledge; 2) Task Semantics Enrichment: which encourages the prompted images to contain richer task-specific semantics with language guidance. We validate the effectiveness of TVP through extensive experiments with 6 modern MLLMs on a wide variety of tasks ranging from object recognition and counting to multimodal reasoning and hallucination correction.
引用
收藏
页码:26552 / 26562
页数:11
相关论文
共 50 条
  • [21] A survey on multimodal large language models
    Yin, Shukang
    Fu, Chaoyou
    Zhao, Sirui
    Li, Ke
    Sun, Xing
    Xu, Tong
    Chen, Enhong
    NATIONAL SCIENCE REVIEW, 2024, 11 (12)
  • [22] A survey on multimodal large language models
    Shukang Yin
    Chaoyou Fu
    Sirui Zhao
    Ke Li
    Xing Sun
    Tong Xu
    Enhong Chen
    National Science Review, 2024, 11 (12) : 277 - 296
  • [23] Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
    Hu, Zhongjian
    Yang, Peng
    Liu, Fengyuan
    Meng, Yuan
    Liu, Xingyu
    BIG DATA MINING AND ANALYTICS, 2024, 7 (03): : 843 - 857
  • [24] How to Optimize Prompting for Large Language Models in Clinical Research
    Lee, Jeong Hyun
    Shin, Jaeseung
    KOREAN JOURNAL OF RADIOLOGY, 2024, 25 (10) : 869 - 873
  • [25] Prompting large language models for inner gains in radiology studies
    India, Partha Pratim Ray
    CLINICAL IMAGING, 2025, 120
  • [26] Active Prompting with Chain-of-Thought for Large Language Models
    Diao, Shizhe
    Wang, Pengcheng
    Lin, Yong
    Pan, Rui
    Liu, Xiang
    Zhang, Tong
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1330 - 1350
  • [27] Guiding Large Language Models via Directional Stimulus Prompting
    Li, Zekun
    Peng, Baolin
    He, Pengcheng
    Galley, Michel
    Gao, Jianfeng
    Yan, Xifeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] The Art of Asking: Prompting Large Language Models for Serendipity Recommendations
    Fu, Zhe
    Niu, Xi
    PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 157 - 166
  • [29] Attention Prompting on Image for Large Vision-Language Models
    Yu, Runpeng
    Yu, Weihao
    Wang, Xinchao
    COMPUTER VISION - ECCV 2024, PT XXX, 2025, 15088 : 251 - 268
  • [30] Grammar Prompting for Domain-Specific Language Generation with Large Language Models
    Wang, Bailin
    Wang, Zi
    Wang, Xuezhi
    Cao, Yuan
    Saurous, Rif A.
    Kim, Yoon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,