ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning

被引:0
|
作者
Pan, Junting [1 ]
Lin, Ziyi [1 ]
Zhu, Xiatian [2 ]
Shao, Jing [1 ]
Li, Hongsheng [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China
[2] Univ Surrey, Surrey Inst People Ctr Artificial Intelligence, CVSSP, Surrey, England
[3] Ctr Perceptual & Interact Intelligence Ltd, Traverse City, MI USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Capitalizing on large pre-trained models for various downstream tasks of interest have recently emerged with promising performance. Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes prohibitively costly in terms of model training and storage. This has led to a new research direction in parameter-efficient transfer learning. However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model. This creates a limit because in some specific modalities, (e.g., video understanding) such a strong pre-trained model with sufficient knowledge is less or not available. In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning. To solve this problem, we propose a new Spatio-Temporal Adapter (ST-Adapter) for parameter-efficient fine-tuning per video task. With a built-in spatio-temporal reasoning capability in a compact design, STAdapter enables a pre-trained image model without temporal knowledge to reason about dynamic video content at a small (similar to 8%) per-task parameter cost, requiring approximately 20 times fewer updated parameters compared to previous work. Extensive experiments on video action recognition tasks show that our ST-Adapter can match or even outperform the strong full fine-tuning strategy and state-of-the-art video models, whilst enjoying the advantage of parameter efficiency. Code and model are available at https://github.com/linziyi96/st-adapter
引用
收藏
页数:16
相关论文
共 50 条
  • [31] HyperCon: Image-To-Video Model Transfer for Video-To-Video Translation Tasks
    Szeto, Ryan
    El-Khamy, Mostafa
    Lee, Jungwon
    Corso, Jason J.
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3079 - 3088
  • [32] Prompt tuning for parameter-efficient medical image segmentation
    Fischer, Marc
    Bartler, Alexander
    Yang, Bin
    MEDICAL IMAGE ANALYSIS, 2024, 91
  • [33] PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in Non-english Text-to-Image Generation
    Ma, Jian
    Chen, Chen
    Xie, Qingsong
    Lu, Haonan
    COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 89 - 105
  • [34] iHair Recolorer: deep image-to-video hair color transfer
    Keyu WU
    Lingchen YANG
    Hongbo FU
    Youyi ZHENG
    ScienceChina(InformationSciences), 2021, 64 (11) : 52 - 66
  • [35] PERS: Parameter-Efficient Multimodal Transfer Learning for Remote Sensing Visual Question Answering
    He, Jinlong
    Liu, Gang
    Li, Pengfei
    Su, Xiaonan
    Jiang, Wenhua
    Zhang, Dongze
    Zhong, Shenjun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 14823 - 14835
  • [36] VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation
    Qiao, Yanyuan
    Yu, Zheng
    Wu, Qi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15397 - 15406
  • [37] Learning to Forecast and Refine Residual Motion for Image-to-Video Generation
    Zhao, Long
    Peng, Xi
    Tian, Yu
    Kapadia, Mubbasir
    Metaxas, Dimitris
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 403 - 419
  • [38] Parameter-Efficient Finetuning for Robust Continual Multilingual Learning
    Badola, Kartikeya
    Dave, Shachi
    Talukdar, Partha
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9763 - 9780
  • [39] Activity Image-to-Video Retrieval via Domain Adversarial Learning
    Liu, Yubin
    Yang, Jinfu
    Yan, Xue
    Song, Lin
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 6183 - 6188
  • [40] Parameter-efficient feature-based transfer for paraphrase identification
    Liu, Xiaodong
    Rzepka, Rafal
    Araki, Kenji
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (04) : 1066 - 1096