ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning

被引：0

作者：

Pan, Junting ^{[1
]}

Lin, Ziyi ^{[1
]}

Zhu, Xiatian ^{[2
]}

Shao, Jing ^{[1
]}

Li, Hongsheng ^{[1
,3
]}

机构：

[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China

[2] Univ Surrey, Surrey Inst People Ctr Artificial Intelligence, CVSSP, Surrey, England

[3] Ctr Perceptual & Interact Intelligence Ltd, Traverse City, MI USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Capitalizing on large pre-trained models for various downstream tasks of interest have recently emerged with promising performance. Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes prohibitively costly in terms of model training and storage. This has led to a new research direction in parameter-efficient transfer learning. However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model. This creates a limit because in some specific modalities, (e.g., video understanding) such a strong pre-trained model with sufficient knowledge is less or not available. In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning. To solve this problem, we propose a new Spatio-Temporal Adapter (ST-Adapter) for parameter-efficient fine-tuning per video task. With a built-in spatio-temporal reasoning capability in a compact design, STAdapter enables a pre-trained image model without temporal knowledge to reason about dynamic video content at a small (similar to 8%) per-task parameter cost, requiring approximately 20 times fewer updated parameters compared to previous work. Extensive experiments on video action recognition tasks show that our ST-Adapter can match or even outperform the strong full fine-tuning strategy and state-of-the-art video models, whilst enjoying the advantage of parameter efficiency. Code and model are available at https://github.com/linziyi96/st-adapter

引用

页数：16

共 50 条

[1] AiRs: Adapter in Remote Sensing for Parameter-Efficient Transfer Learning
Hu, Leiyi
Yu, Hongfeng
Lu, Wanxuan
Yin, Dongshuo
Sun, Xian
Fu, Kun
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 18
[2] Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Qing, Zhiwu
Zhang, Shiwei
Huang, Ziyuan
Zhang, Yingya
Gao, Changxin
Zhao, Deli
Sang, Nong
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13888 - 13898
[3] VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Sung, Yi-Lin
Cho, Jaemin
Bansal, Mohit
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5217 - 5227
[4] Parameter-Efficient Transfer Learning for NLP
Houlsby, Neil
Giurgiu, Andrei
Jastrzebski, Stanislaw
Morrone, Bruna
de laroussilhe, Quentin
Gesmundo, Andrea
Attariyan, Mona
Gelly, Sylvain
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[5] Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language
Liu, Yuqi
Xu, Luhui
Xiong, Pengfei
Jin, Qin
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1781 - 1789
[6] READ-PVLA: Recurrent Adapter with Partial Video -Language Alignment for Parameter-Efficient Transfer Learning in Low -Resource Video -Language Modeling
Nguyen, Thong
Wu, Xiaobao
Dong, Xinshuai
Le, Khoi
Hu, Zhiyuan
Nguyen, Cong-Duy
Ng, See-Kiong
Luu, Anh Tuan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18824 - 18832
[7] Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
Zhou, Xin
Liang, Dingkang
Xu, Wei
Zhu, Xingkui
Xu, Yihan
Zou, Zhikang
Bai, Xiang
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14707 - 14717
[8] Parameter-Efficient Transfer Learning with Diff Pruning
Guo, Demi
Rush, Alexander M.
Kim, Yoon
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4884 - 4896
[9] R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Liu, Ye
He, Jixuan
Li, Wanhua
Kim, Junsik
Wei, Donglai
Pfister, Hanspeter
Chen, Chang Wen
COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 421 - 438
[10] VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding
Xin, Yi
Du, Junlong
Wang, Qiang
Lin, Zhiwen
Yan, Ke
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16085 - 16093

← 1 2 3 4 5 →