LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

被引:429
|
作者
Graham, Ben
El-Nouby, Alaaeldin
Touvron, Hugo
Stock, Pierre
Joulin, Armand
Jegou, Herve
Douze, Matthijs
机构
关键词
D O I
10.1109/ICCV48922.2021.01204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware. We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular activation maps with decreasing resolutions. We also introduce the attention bias, a new way to integrate positional information in vision transformers. As a result, we propose LeViT: a hybrid neural network for fast inference image classification. We consider different measures of efficiency on different hardware platforms, so as to best reflect a wide range of application scenarios. Our extensive experiments empirically validate our technical choices and show they are suitable to most architectures. Overall, LeViT significantly outperforms existing convnets and vision transformers with respect to the speed/accuracy tradeoff. For example, at 80% ImageNet top-1 accuracy, LeViT is 5 times faster than EfficientNet on CPU. We release the code at https://github.com/facebookresearch/LeViT.
引用
收藏
页码:12239 / 12249
页数:11
相关论文
共 49 条
  • [21] An efficient vision transformer for Alzheimer's disease classification using magnetic resonance images
    Lu, Si-Yuan
    Zhang, Yu-Dong
    Yao, Yu-Dong
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 101
  • [22] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
    You, Haoran
    Xiong, Yunyang
    Dai, Xiaoliang
    Wu, Bichen
    Zhang, Peizhao
    Fan, Haoqi
    Vajda, Peter
    Lin, Yingyan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14431 - 14442
  • [23] CFI-ViT: A coarse-to-fine inference based vision transformer for gastric cancer subtype detection using pathological images
    Wang, Xinghang
    Tao, Haibo
    Wang, Bin
    Jin, Huaiping
    Li, Zhenhui
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [24] Decoding the User's Movements Preparation From EEG Signals Using Vision Transformer Architecture
    Al-Quraishi, Maged S.
    Elamvazuthi, Irraivan
    Tang, Tong Boon
    AL-Qurishi, Muhammad
    Adil, Syed Hasan
    Ebrahim, Mansoor
    Borboni, Alberto
    IEEE ACCESS, 2022, 10 : 109446 - 109459
  • [25] S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing With Statistical Tokens
    Cai, Rizhao
    Yu, Zitong
    Kong, Chenqi
    Li, Haoliang
    Chen, Changsheng
    Hu, Yongjian
    Kot, Alex C.
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 8385 - 8397
  • [26] Multimodal Neuroimaging Fusion for Alzheimer's Disease: An Image Colorization Approach With Mobile Vision Transformer
    Odusami, Modupe
    Damasevicius, Robertas
    Milieskaite-Belousoviene, Egle
    Maskeliunas, Rytis
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (05)
  • [27] Pixel-Level Fusion Approach with Vision Transformer for Early Detection of Alzheimer's Disease
    Odusami, Modupe
    Maskeliunas, Rytis
    Damasevicius, Robertas
    ELECTRONICS, 2023, 12 (05)
  • [28] Transforming Alzheimer's Disease Diagnosis: Implementing Vision Transformer (ViT) for MRI Images Classification
    Kurniasari, Dian
    Pratama, Muhammad Dwi
    Junaidi, Akmal
    Faisol, Ahmad
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2025, 24 (01): : 130 - 152
  • [29] Vision-Based Cow Tracking and Feeding Monitoring for Autonomous Livestock Farming: The YOLOv5s-CA+DeepSORT-Vision Transformer
    Guo, Yangyang
    Hong, Wenhao
    Wu, Jiaxin
    Huang, Xiaoping
    Qiao, Yongliang
    Kong, He
    IEEE ROBOTICS & AUTOMATION MAGAZINE, 2023, 30 (04) : 68 - 76
  • [30] Early-Stage Parkinson's Disease Detection Based on Optical Flow and Video Vision Transformer
    Razzouki, Anas Filali
    Jeancolas, Laetitia
    Mangone, Graziella
    Sambin, Sara
    Chalancon, Alize
    Gomes, Manon
    Lehericy, Stephane
    Corvol, Jean-Christophe
    Vidailhet, Marie
    Arnulf, Isabelle
    El-Yacoubi, Mounim A.
    Petrovska-Delacretaz, Dijana
    2024 16TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, HSI 2024, 2024,