AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

被引:0
|
作者
Chen, Shoufa [1 ]
Ge, Chongjian [1 ]
Tong, Zhan [2 ]
Wang, Jiangliu [2 ]
Song, Yibing [2 ]
Wang, Jue [2 ]
Luo, Ping [1 ]
机构
[1] Univ Hong Kong, Hong Kong, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and memory storage. Each model needs an independent and complete finetuning process to adapt to different tasks, which limits its transferability to different visual domains. To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently. It possesses several benefits more appealing than prior arts. Firstly, AdaptFormer introduces lightweight modules that only add less than 2% extra parameters to a ViT, while it is able to increase the ViT's transferability without updating its original pre-trained parameters, significantly outperforming the existing 100% fully fine-tuned models on action recognition benchmarks. Secondly, it can be plug-and-play in different Transformers and scalable to many visual tasks. Thirdly, extensive experiments on five image and video datasets show that AdaptFormer largely improves ViTs in the target domains. For example, when updating just 1.5% extra parameters, it achieves about 10% and 19% relative improvement compared to the fully fine-tuned models on Something-Something v2 and HMDB51, respectively. Code is available at https://github.com/ShoufaChen/AdaptFormer.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Scalable Vision Transformers with Hierarchical Pooling
    Pan, Zizheng
    Zhuang, Bohan
    Liu, Jing
    He, Haoyu
    Cai, Jianfei
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 367 - 376
  • [2] An arabic visual speech recognition framework with CNN and vision transformers for lipreading
    Baaloul, Ali
    Benblidia, Nadjia
    Reguieg, Fatma Zohra
    Bouakkaz, Mustapha
    Felouat, Hisham
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (27) : 69989 - 70023
  • [3] Lv-Adapter: Adapting Vision Transformers for Visual Classification with Linear-layers and Vectors
    Xu, Guangyi
    Ye, Junyong
    Liu, Xinyuan
    Wen, Xubin
    Li, Youwei
    Wang, Jingjing
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 246
  • [4] Bottleneck Transformers for Visual Recognition
    Srinivas, Aravind
    Lin, Tsung-Yi
    Parmar, Niki
    Shlens, Jonathon
    Abbeel, Pieter
    Vaswani, Ashish
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16514 - 16524
  • [5] Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)
    Cheema, Musa Dildar Ahmed
    Shaiq, Mohammad Daniyal
    Mirza, Farhaan
    Kamal, Ali
    Naeem, M. Asif
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [6] Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)
    Cheema M.D.A.
    Shaiq M.D.
    Mirza F.
    Kamal A.
    Naeem M.A.
    PeerJ Computer Science, 2024, 10 : 1 - 24
  • [7] Vision Transformers for Vein Biometric Recognition
    Garcia-Martin, Raul
    Sanchez-Reillo, Raul
    IEEE ACCESS, 2023, 11 : 22060 - 22080
  • [8] AutoFormer: Searching Transformers for Visual Recognition
    Chen, Minghao
    Peng, Houwen
    Fu, Jianlong
    Ling, Haibin
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12250 - 12260
  • [9] Visual Transformers: Where Do Transformers Really Belong in Vision Models?
    Wu, Bichen
    Xu, Chenfeng
    Dai, Xiaoliang
    Wan, Alvin
    Zhang, Peizhao
    Yan, Zhicheng
    Tomizuka, Masayoshi
    Gonzalez, Joseph
    Keutzer, Kurt
    Vajda, Peter
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 579 - 589
  • [10] Automated Aircraft Recognition via Vision Transformers
    Huo, Yintong
    Peng, Yun
    Lyu, Michael
    2023 IEEE AEROSPACE CONFERENCE, 2023,