AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

被引：0

作者：

Chen, Shoufa ^{[1
]}

Ge, Chongjian ^{[1
]}

Tong, Zhan ^{[2
]}

Wang, Jiangliu ^{[2
]}

Song, Yibing ^{[2
]}

Wang, Jue ^{[2
]}

Luo, Ping ^{[1
]}

机构：

[1] Univ Hong Kong, Hong Kong, Peoples R China

[2] Tencent AI Lab, Shenzhen, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and memory storage. Each model needs an independent and complete finetuning process to adapt to different tasks, which limits its transferability to different visual domains. To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently. It possesses several benefits more appealing than prior arts. Firstly, AdaptFormer introduces lightweight modules that only add less than 2% extra parameters to a ViT, while it is able to increase the ViT's transferability without updating its original pre-trained parameters, significantly outperforming the existing 100% fully fine-tuned models on action recognition benchmarks. Secondly, it can be plug-and-play in different Transformers and scalable to many visual tasks. Thirdly, extensive experiments on five image and video datasets show that AdaptFormer largely improves ViTs in the target domains. For example, when updating just 1.5% extra parameters, it achieves about 10% and 19% relative improvement compared to the fully fine-tuned models on Something-Something v2 and HMDB51, respectively. Code is available at https://github.com/ShoufaChen/AdaptFormer.

引用

页数：15

共 50 条

[1] Scalable Vision Transformers with Hierarchical Pooling
Pan, Zizheng
Zhuang, Bohan
Liu, Jing
He, Haoyu
Cai, Jianfei
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 367 - 376
[2] An arabic visual speech recognition framework with CNN and vision transformers for lipreading
Baaloul, Ali
Benblidia, Nadjia
Reguieg, Fatma Zohra
Bouakkaz, Mustapha
Felouat, Hisham
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (27) : 69989 - 70023
[3] Lv-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors
Xu, Guangyi
Ye, Junyong
Liu, Xinyuan
Wen, Xubin
Li, Youwei
Wang, Jingjing
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 246
[4] Bottleneck Transformers for Visual Recognition
Srinivas, Aravind
Lin, Tsung-Yi
Parmar, Niki
Shlens, Jonathon
Abbeel, Pieter
Vaswani, Ashish
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16514 - 16524
[5] Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)
Cheema, Musa Dildar Ahmed
Shaiq, Mohammad Daniyal
Mirza, Farhaan
Kamal, Ali
Naeem, M. Asif
PEERJ COMPUTER SCIENCE, 2024, 10
[6] Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)
Cheema M.D.A.
Shaiq M.D.
Mirza F.
Kamal A.
Naeem M.A.
PeerJ Computer Science, 2024, 10 : 1 - 24
[7] Vision Transformers for Vein Biometric Recognition
Garcia-Martin, Raul
Sanchez-Reillo, Raul
IEEE ACCESS, 2023, 11 : 22060 - 22080
[8] AutoFormer: Searching Transformers for Visual Recognition
Chen, Minghao
Peng, Houwen
Fu, Jianlong
Ling, Haibin
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12250 - 12260
[9] Visual Transformers: Where Do Transformers Really Belong in Vision Models?
Wu, Bichen
Xu, Chenfeng
Dai, Xiaoliang
Wan, Alvin
Zhang, Peizhao
Yan, Zhicheng
Tomizuka, Masayoshi
Gonzalez, Joseph
Keutzer, Kurt
Vajda, Peter
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 579 - 589
[10] Automated Aircraft Recognition via Vision Transformers
Huo, Yintong
Peng, Yun
Lyu, Michael
2023 IEEE AEROSPACE CONFERENCE, 2023,

← 1 2 3 4 5 →