ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning

被引:0
|
作者
Pan, Junting [1 ]
Lin, Ziyi [1 ]
Zhu, Xiatian [2 ]
Shao, Jing [1 ]
Li, Hongsheng [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China
[2] Univ Surrey, Surrey Inst People Ctr Artificial Intelligence, CVSSP, Surrey, England
[3] Ctr Perceptual & Interact Intelligence Ltd, Traverse City, MI USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Capitalizing on large pre-trained models for various downstream tasks of interest have recently emerged with promising performance. Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes prohibitively costly in terms of model training and storage. This has led to a new research direction in parameter-efficient transfer learning. However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model. This creates a limit because in some specific modalities, (e.g., video understanding) such a strong pre-trained model with sufficient knowledge is less or not available. In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning. To solve this problem, we propose a new Spatio-Temporal Adapter (ST-Adapter) for parameter-efficient fine-tuning per video task. With a built-in spatio-temporal reasoning capability in a compact design, STAdapter enables a pre-trained image model without temporal knowledge to reason about dynamic video content at a small (similar to 8%) per-task parameter cost, requiring approximately 20 times fewer updated parameters compared to previous work. Extensive experiments on video action recognition tasks show that our ST-Adapter can match or even outperform the strong full fine-tuning strategy and state-of-the-art video models, whilst enjoying the advantage of parameter efficiency. Code and model are available at https://github.com/linziyi96/st-adapter
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Parameter-Efficient Learning for Text-to-Speech Accent Adaptation
    Yang, Li-Jen
    Yang, Chao-Han Huck
    Chien, Jen-Tzung
    INTERSPEECH 2023, 2023, : 4354 - 4358
  • [42] A Unified Continual Learning Framework with General Parameter-Efficient Tuning
    Gao, Qiankun
    Zhao, Chen
    Sun, Yifan
    Xi, Teng
    Zhang, Gang
    Ghanem, Bernard
    Zhang, Jian
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11449 - 11459
  • [43] Client-Customized Adaptation for Parameter-Efficient Federated Learning
    Kim, Yeachan
    Kim, Junho
    Mok, Wing-Lam
    Park, Jun-Hyung
    Lee, SangKeun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1159 - 1172
  • [44] TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
    Zhang, Zhongwei
    Long, Fuchen
    Pan, Yingwei
    Qiu, Zhaofan
    Yao, Ting
    Cao, Yang
    Mei, Tao
    arXiv,
  • [45] TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
    Zhang, Zhongwei
    Long, Fuchen
    Pan, Yingwei
    Qiu, Zhaofan
    Yao, Ting
    Cao, Yang
    Mei, Tao
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8671 - 8681
  • [46] LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
    Hu, Zhiqiang
    Wang, Lei
    Lan, Yihuai
    Xu, Wanyu
    Lim, Ee-Peng
    Bing, Lidong
    Xu, Xing
    Poria, Soujanya
    Lee, Roy Ka-Wei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5254 - 5276
  • [47] Parameter-efficient fine-tuning for single image snow removal
    Dai, Xinwei
    Zhou, Yuanbo
    Qiu, Xintao
    Tang, Hui
    Tong, Tong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 265
  • [48] Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis
    Yang, Fu-En
    Chang, Jing-Cheng
    Lee, Yuan-Hao
    Wang, Yu-Chiang Frank
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6764 - 6771
  • [49] Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation
    Yuan, Fajie
    He, Xiangnan
    Karatzoglou, Alexandros
    Zhang, Liguang
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1469 - 1478
  • [50] Image-to-Video Person Re-Identification With Temporally Memorized Similarity Learning
    Zhang, Dongyu
    Wu, Wenxi
    Cheng, Hui
    Zhang, Ruimao
    Dong, Zhenjiang
    Cai, Zhaoquan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 2622 - 2632