ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning

被引:0
|
作者
Pan, Junting [1 ]
Lin, Ziyi [1 ]
Zhu, Xiatian [2 ]
Shao, Jing [1 ]
Li, Hongsheng [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China
[2] Univ Surrey, Surrey Inst People Ctr Artificial Intelligence, CVSSP, Surrey, England
[3] Ctr Perceptual & Interact Intelligence Ltd, Traverse City, MI USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Capitalizing on large pre-trained models for various downstream tasks of interest have recently emerged with promising performance. Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes prohibitively costly in terms of model training and storage. This has led to a new research direction in parameter-efficient transfer learning. However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model. This creates a limit because in some specific modalities, (e.g., video understanding) such a strong pre-trained model with sufficient knowledge is less or not available. In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning. To solve this problem, we propose a new Spatio-Temporal Adapter (ST-Adapter) for parameter-efficient fine-tuning per video task. With a built-in spatio-temporal reasoning capability in a compact design, STAdapter enables a pre-trained image model without temporal knowledge to reason about dynamic video content at a small (similar to 8%) per-task parameter cost, requiring approximately 20 times fewer updated parameters compared to previous work. Extensive experiments on video action recognition tasks show that our ST-Adapter can match or even outperform the strong full fine-tuning strategy and state-of-the-art video models, whilst enjoying the advantage of parameter efficiency. Code and model are available at https://github.com/linziyi96/st-adapter
引用
收藏
页数:16
相关论文
共 50 条
  • [21] M-adapter: Multi-level image-to-video adaptation for video action recognition
    Li, Rongchang
    Xu, Tianyang
    Wu, Xiao-Jun
    Yang, Xiao
    Li, Linze
    Shen, Zhongwei
    Kittler, Josef
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [22] iHairRecolorer: deep image-to-video hair color transfer
    Keyu Wu
    Lingchen Yang
    Hongbo Fu
    Youyi Zheng
    Science China Information Sciences, 2021, 64
  • [23] Abnormal Action Detection Based on Parameter-Efficient Transfer Learning in Laboratory Scenarios
    Liu, Changyu
    Huang, Hao
    Huang, Guogang
    Wu, Chunyin
    Liang, Yingqi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (03): : 4219 - 4242
  • [24] Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models
    Chen, Yuyan
    Fu, Qiang
    Fan, Ge
    Du, Lun
    Lou, Jian-Guang
    Han, Shi
    Zhang, Dongmei
    Li, Zhixu
    Xiao, Yanghua
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 276 - 285
  • [25] iHairRecolorer: deep image-to-video hair color transfer
    Wu, Keyu
    Yang, Lingchen
    Fu, Hongbo
    Zheng, Youyi
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (11)
  • [26] Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning
    Lin, Zhaojiang
    Madotto, Andrea
    Fung, Pascale
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 441 - 459
  • [27] One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning
    Zeng, Guangtao
    Zhang, Peiyuan
    Lu, Wei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7564 - 7580
  • [28] 1% VS 100%: Parameter-Efficient Low Rank Adapter for Dense Predictions
    Yin, Dongshuo
    Yang, Yiran
    Wang, Zhechao
    Yu, Hongfeng
    Wei, Kaiwen
    Sun, Xian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20116 - 20126
  • [29] Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation
    Qi, Jinzi
    Van Hamme, Hugo
    INTERSPEECH 2023, 2023, : 151 - 155
  • [30] PACIA: Parameter-Efficient Adapter for Few-Shot Molecular Property Prediction
    Wu, Shiguang
    Wang, Yaqing
    Yao, Quanming
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 5208 - 5216