LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

被引:429
|
作者
Graham, Ben
El-Nouby, Alaaeldin
Touvron, Hugo
Stock, Pierre
Joulin, Armand
Jegou, Herve
Douze, Matthijs
机构
关键词
D O I
10.1109/ICCV48922.2021.01204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware. We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular activation maps with decreasing resolutions. We also introduce the attention bias, a new way to integrate positional information in vision transformers. As a result, we propose LeViT: a hybrid neural network for fast inference image classification. We consider different measures of efficiency on different hardware platforms, so as to best reflect a wide range of application scenarios. Our extensive experiments empirically validate our technical choices and show they are suitable to most architectures. Overall, LeViT significantly outperforms existing convnets and vision transformers with respect to the speed/accuracy tradeoff. For example, at 80% ImageNet top-1 accuracy, LeViT is 5 times faster than EfficientNet on CPU. We release the code at https://github.com/facebookresearch/LeViT.
引用
收藏
页码:12239 / 12249
页数:11
相关论文
共 49 条
  • [31] Alzheimer's disease detection and stage identification from magnetic resonance brain images using vision transformer
    Alshayeji, Mohammad H.
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (03):
  • [32] Hybrid-RViT: Hybridizing ResNet-50 and Vision Transformer for Enhanced Alzheimer's disease detection
    Yan, Hongjie
    Mubonanyikuzo, Vivens
    Komolafe, Temitope Emmanuel
    Zhou, Liang
    Wu, Tao
    Wang, Nizhuan
    PLOS ONE, 2025, 20 (02):
  • [33] DE-ViT: State-Of-The-Art Vision Transformer Model for Early Detection of Alzheimer's Disease
    Sen, Anuvab
    Roy, Subhabrata
    Debnath, Ariv
    Jha, Gourav
    Ghosh, Rahul
    2024 NATIONAL CONFERENCE ON COMMUNICATIONS, NCC, 2024,
  • [34] Multiple Inputs and Mixed Data for Alzheimer's Disease Classification Based on 3D Vision Transformer
    Castro-Silva, Juan A.
    Moreno-Garcia, Maria N.
    Peluffo-Ordonez, Diego H.
    MATHEMATICS, 2024, 12 (17)
  • [35] S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using StripsWindow Attention
    Zhang, Chiyu
    Xu, Xiaogang
    Wang, Lei
    Dai, Zaiyan
    Yang, Jun
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7024 - 7032
  • [36] Vision Transformer Approach for Classification of Alzheimer's Disease Using 18F-Florbetaben Brain Images
    Shin, Hyunji
    Jeon, Soomin
    Seol, Youngsoo
    Kim, Sangjin
    Kang, Doyoung
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [37] Prospective validation of a novel vision transformer model to detect early neoplasia for Barrett's surveillance: an interim analysis
    Tan, Jin Lin
    Pitawela, Dileepa
    Chinnaratha, Mohamed Asif
    Chen, Hsiang-Ting
    Singh, Rajvinder
    JOURNAL OF GASTROENTEROLOGY AND HEPATOLOGY, 2024, 39 : 364 - 364
  • [38] SpectroCVT-Net: A convolutional vision transformer architecture and channel attention for classifying Alzheimer's disease using spectrograms
    Bravo-Ortiz, Mario Alejandro
    Guevara-Navarro, Ernesto
    Holguín-García, Sergio Alejandro
    Rivera-Garcia, Mariana
    Cardona-Morales, Oscar
    Ruz, Gonzalo A.
    Tabares-Soto, Reinel
    Computers in Biology and Medicine, 2024, 181
  • [39] Scene Recognition for Visually-Impaired People's Navigation Assistance Based on Vision Transformer with Dual Multiscale Attention
    Said, Yahia
    Atri, Mohamed
    Albahar, Marwan Ali
    Ben Atitallah, Ahmed
    Alsariera, Yazan Ahmad
    MATHEMATICS, 2023, 11 (05)
  • [40] Research on Diesel Engine Fault Status Identification Method Based on Synchro Squeezing S-Transform and Vision Transformer
    Li, Siyu
    Liu, Zichang
    Yan, Yunbin
    Wang, Rongcai
    Dong, Enzhi
    Cheng, Zhonghua
    SENSORS, 2023, 23 (14)