Pro-Tuning: Unified Prompt Tuning for Vision Tasks

被引:7
|
作者
Nie, Xing [1 ,2 ]
Ni, Bolin [1 ,2 ]
Chang, Jianlong [3 ]
Meng, Gaofeng [1 ,2 ,4 ]
Huo, Chunlei [5 ,6 ]
Xiang, Shiming [1 ,2 ]
Tian, Qi [3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Huawei Cloud & AI, Beijing 100095, Peoples R China
[4] HK Inst Sci & Innovat, CAS Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China
[5] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[6] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
关键词
Task analysis; Adaptation models; Tuning; Computational modeling; Transformers; Visualization; Training; Prompt-based learning; representation learning; task-specific knowledge; transfer learning;
D O I
10.1109/TCSVT.2023.3327605
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In computer vision, fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks. However, deploying it in practice is quite challenging, due to adopting parameter inefficient global update and heavily relying on high-quality downstream data. Recently, prompt-based learning, which adds the task-relevant prompt to adapt the pre-trained models to downstream tasks, has drastically boosted the performance of many natural language downstream tasks. In this work, we extend this notable transfer ability benefited from prompt into vision models as an alternative to fine-tuning. To this end, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt diverse frozen pre-trained models to a wide variety of downstream vision tasks. The key to Pro-tuning is prompt-based tuning, i.e., learning task-specific vision prompts for downstream input images with the pre-trained model frozen. By only training a small number of additional parameters, Pro-tuning can generate compact and robust downstream models both for CNN-based and transformer-based network architectures. Comprehensive experiments evidence that the proposed Pro-tuning outperforms fine-tuning on a broad range of vision tasks and scenarios, including image classification (under generic objects, class imbalance, image corruption, natural adversarial examples, and out-of-distribution generalization), and dense prediction tasks such as object detection and semantic segmentation.
引用
收藏
页码:4653 / 4667
页数:15
相关论文
共 50 条
  • [1] LION: Implicit Vision Prompt Tuning
    Wang, Haixin
    Chang, Jianlong
    Zhai, Yihang
    Luo, Xiao
    Sun, Jinan
    Lin, Zhouchen
    Tian, Qi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5372 - 5380
  • [2] Prompt Tuning for Unified Multimodal Pretrained Models
    Yang, Hao
    Lin, Junyang
    Yang, An
    Wang, Peng
    Zhou, Chang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 402 - 416
  • [3] P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks
    Liu, Xiao
    Ji, Kaixuan
    Fu, Yicheng
    Tam, Weng Lam
    Du, Zhengxiao
    Yang, Zhilin
    Tang, Jie
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 61 - 68
  • [4] Adversarial Prompt Tuning for Vision-Language Models
    Zhang, Jiaming
    Ma, Xingjun
    Wang, Xin
    Qiu, Lingyu
    Wang, Jiaqi
    Jiang, Yu-Gang
    Sang, Jitao
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 56 - 72
  • [5] Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization
    Razdaibiedina, Anastasia
    Mao, Yuning
    Khabsa, Madian
    Lewis, Mike
    Hou, Rui
    Ba, Jimmy
    Almahairi, Amjad
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 6740 - 6757
  • [6] Data Augmentation by Prompt Tuning on Natural Language Understanding Tasks
    Wang, Yu-Hao
    Chang, Chia-Ming
    Tsai, Yi-Hang
    Hwang, San-Yih
    2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 807 - 808
  • [7] Visual Prompt Tuning
    Jia, Menglin
    Tang, Luming
    Chen, Bor-Chun
    Cardie, Claire
    Belongie, Serge
    Hariharan, Bharath
    Lim, Ser-Nam
    COMPUTER VISION - ECCV 2022, PT XXXIII, 2022, 13693 : 709 - 727
  • [8] HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks
    Zhang, Zhengkun
    Guo, Wenya
    Meng, Xiaojun
    Wang, Yasheng
    Wang, Yadao
    Jiang, Xin
    Liu, Qun
    Yang, Zhenglu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11442 - 11453
  • [9] Distribution-Aware Prompt Tuning for Vision-Language Models
    Cho, Eulrang
    Kim, Jooyeon
    Kim, Hyunwoo J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21947 - 21956
  • [10] Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models
    Ma, Chengcheng
    Liu, Yang
    Deng, Jiankang
    Xie, Lingxi
    Dong, Weiming
    Xu, Changsheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4616 - 4629