AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

被引:0
|
作者
Chen, Shoufa [1 ]
Ge, Chongjian [1 ]
Tong, Zhan [2 ]
Wang, Jiangliu [2 ]
Song, Yibing [2 ]
Wang, Jue [2 ]
Luo, Ping [1 ]
机构
[1] Univ Hong Kong, Hong Kong, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and memory storage. Each model needs an independent and complete finetuning process to adapt to different tasks, which limits its transferability to different visual domains. To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently. It possesses several benefits more appealing than prior arts. Firstly, AdaptFormer introduces lightweight modules that only add less than 2% extra parameters to a ViT, while it is able to increase the ViT's transferability without updating its original pre-trained parameters, significantly outperforming the existing 100% fully fine-tuned models on action recognition benchmarks. Secondly, it can be plug-and-play in different Transformers and scalable to many visual tasks. Thirdly, extensive experiments on five image and video datasets show that AdaptFormer largely improves ViTs in the target domains. For example, when updating just 1.5% extra parameters, it achieves about 10% and 19% relative improvement compared to the fully fine-tuned models on Something-Something v2 and HMDB51, respectively. Code is available at https://github.com/ShoufaChen/AdaptFormer.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] SDViT: Stacking of Distilled Vision Transformers for Hand Gesture Recognition
    Tan, Chun Keat
    Lim, Kian Ming
    Lee, Chin Poo
    Chang, Roy Kwang Yang
    Alqahtani, Ali
    APPLIED SCIENCES-BASEL, 2023, 13 (22):
  • [22] Self-Supervised Vision Transformers for Scalable Anomaly Detection over Images
    Samele, Stefano
    Matteucci, Matteo
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [23] ADAPTING COMPUTER VISION SYSTEMS TO THE VISUAL ENVIRONMENT - TOPOGRAPHIC MAPPING
    ZIELKE, T
    STORJOHANN, K
    MALLOT, HA
    VONSEELEN, W
    LECTURE NOTES IN COMPUTER SCIENCE, 1990, 427 : 613 - 615
  • [24] Artwork Style Recognition Using Vision Transformers and MLP Mixer
    Iliadis, Lazaros Alexios
    Nikolaidis, Spyridon
    Sarigiannidis, Panagiotis
    Wan, Shaohua
    Goudos, Sotirios K.
    TECHNOLOGIES, 2022, 10 (01)
  • [25] AdapterHub: A Framework for Adapting Transformers
    Pfeiffer, Jonas
    Ruckle, Andreas
    Poth, Clifton
    Kamath, Aishwarya
    Vulic, Ivan
    Ruder, Sebastian
    Cho, Kyunghyun
    Gurevych, Iryna
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, 2020, : 46 - 54
  • [26] Facial Expression Recognition With Visual Transformers and Attentional Selective Fusion
    Ma, Fuyan
    Sun, Bin
    Li, Shutao
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1236 - 1248
  • [27] Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency
    Prabhu, Viraj
    Yenamandra, Sriram
    Singh, Aaditya
    Hoffman, Judy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] Domain-Adaptive Vision Transformers for Generalizing Across Visual Domains
    Cho, Yunsung
    Yun, Jungmin
    Kwon, Junehyoung
    Kim, Youngbin
    IEEE ACCESS, 2023, 11 : 115644 - 115653
  • [29] Vision Transformers are Parameter-Efficient Audio-Visual Learners
    Lin, Yan-Bo
    Sung, Yi-Lin
    Lei, Jie
    Bansal, Mohit
    Bertasius, Gedas
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2299 - 2309
  • [30] How Does Attention Work in Vision Transformers? A Visual Analytics Attempt
    Li, Yiran
    Wang, Junpeng
    Dai, Xin
    Wang, Liang
    Yeh, Chin-Chia Michael
    Zheng, Yan
    Zhang, Wei
    Ma, Kwan-Liu
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (06) : 2888 - 2900