ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning

被引:0
|
作者
Pan, Junting [1 ]
Lin, Ziyi [1 ]
Zhu, Xiatian [2 ]
Shao, Jing [1 ]
Li, Hongsheng [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China
[2] Univ Surrey, Surrey Inst People Ctr Artificial Intelligence, CVSSP, Surrey, England
[3] Ctr Perceptual & Interact Intelligence Ltd, Traverse City, MI USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Capitalizing on large pre-trained models for various downstream tasks of interest have recently emerged with promising performance. Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes prohibitively costly in terms of model training and storage. This has led to a new research direction in parameter-efficient transfer learning. However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model. This creates a limit because in some specific modalities, (e.g., video understanding) such a strong pre-trained model with sufficient knowledge is less or not available. In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning. To solve this problem, we propose a new Spatio-Temporal Adapter (ST-Adapter) for parameter-efficient fine-tuning per video task. With a built-in spatio-temporal reasoning capability in a compact design, STAdapter enables a pre-trained image model without temporal knowledge to reason about dynamic video content at a small (similar to 8%) per-task parameter cost, requiring approximately 20 times fewer updated parameters compared to previous work. Extensive experiments on video action recognition tasks show that our ST-Adapter can match or even outperform the strong full fine-tuning strategy and state-of-the-art video models, whilst enjoying the advantage of parameter efficiency. Code and model are available at https://github.com/linziyi96/st-adapter
引用
收藏
页数:16
相关论文
共 50 条
  • [1] AiRs: Adapter in Remote Sensing for Parameter-Efficient Transfer Learning
    Hu, Leiyi
    Yu, Hongfeng
    Lu, Wanxuan
    Yin, Dongshuo
    Sun, Xian
    Fu, Kun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 18
  • [2] Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
    Qing, Zhiwu
    Zhang, Shiwei
    Huang, Ziyuan
    Zhang, Yingya
    Gao, Changxin
    Zhao, Deli
    Sang, Nong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13888 - 13898
  • [3] VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
    Sung, Yi-Lin
    Cho, Jaemin
    Bansal, Mohit
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5217 - 5227
  • [4] Parameter-Efficient Transfer Learning for NLP
    Houlsby, Neil
    Giurgiu, Andrei
    Jastrzebski, Stanislaw
    Morrone, Bruna
    de laroussilhe, Quentin
    Gesmundo, Andrea
    Attariyan, Mona
    Gelly, Sylvain
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [5] Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language
    Liu, Yuqi
    Xu, Luhui
    Xiong, Pengfei
    Jin, Qin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1781 - 1789
  • [6] READ-PVLA: Recurrent Adapter with Partial Video -Language Alignment for Parameter-Efficient Transfer Learning in Low -Resource Video -Language Modeling
    Nguyen, Thong
    Wu, Xiaobao
    Dong, Xinshuai
    Le, Khoi
    Hu, Zhiyuan
    Nguyen, Cong-Duy
    Ng, See-Kiong
    Luu, Anh Tuan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18824 - 18832
  • [7] Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
    Zhou, Xin
    Liang, Dingkang
    Xu, Wei
    Zhu, Xingkui
    Xu, Yihan
    Zou, Zhikang
    Bai, Xiang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14707 - 14717
  • [8] Parameter-Efficient Transfer Learning with Diff Pruning
    Guo, Demi
    Rush, Alexander M.
    Kim, Yoon
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4884 - 4896
  • [9] R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
    Liu, Ye
    He, Jixuan
    Li, Wanhua
    Kim, Junsik
    Wei, Donglai
    Pfister, Hanspeter
    Chen, Chang Wen
    COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 421 - 438
  • [10] VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding
    Xin, Yi
    Du, Junlong
    Wang, Qiang
    Lin, Zhiwen
    Yan, Ke
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16085 - 16093