Neighborhood Attention Transformer

被引:115
|
作者
Hassani, Ali [1 ,2 ]
Walton, Steven [1 ,2 ]
Li, Jiachen [1 ,2 ]
Li, Shen [4 ]
Shi, Humphrey [1 ,2 ,3 ]
机构
[1] Univ Oregon, SHI Labs, Eugene, OR 97403 USA
[2] UIUC, Champaign, IL 61801 USA
[3] Picsart AI Res PAIR, New York, NY USA
[4] Meta Facebook AI, Menlo Pk, CA USA
关键词
D O I
10.1109/CVPR52729.2023.00599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Neighborhood Attention (NA), the first efficient and scalable sliding window attention mechanism for vision. NA is a pixel-wise operation, localizing self attention (SA) to the nearest neighboring pixels, and therefore enjoys a linear time and space complexity compared to the quadratic complexity of SA. The sliding window pattern allows NA's receptive field to grow without needing extra pixel shifts, and preserves translational equivariance, unlike Swin Transformer's Window Self Attention (WSA). We develop NATTEN (Neighborhood Attention Extension), a Python package with efficient C++ and CUDA kernels, which allows NA to run up to 40% faster than Swin's WSA while using up to 25% less memory. We further present Neighborhood Attention Transformer (NAT), a new hierarchical transformer design based on NA that boosts image classification and downstream vision performance. Experimental results on NAT are competitive; NAT-Tiny reaches 83.2% top-1 accuracy on ImageNet, 51.4% mAP on MS-COCO and 48.4% mIoU on ADE20K, which is 1.9% ImageNet accuracy, 1.0% COCO mAP, and 2.6% ADE20K mIoU improvement over a Swin model with similar size. To support more research based on sliding window attention, we open source our project and release our checkpoints.
引用
收藏
页码:6185 / 6194
页数:10
相关论文
共 50 条
  • [21] The triple attention transformer: advancing contextual coherence in transformer models
    Ghaith, Shadi
    EVOLUTIONARY INTELLIGENCE, 2024, 17 (5-6) : 3723 - 3744
  • [22] Dense Attention: A Densely Connected Attention Mechanism for Vision Transformer
    Li, Nannan
    Chen, Yaran
    Zhao, Dongbin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [23] RealFormer: Transformer Likes Residual Attention
    He, Ruining
    Ravula, Anirudh
    Kanagal, Bhargav
    Ainslie, Joshua
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 929 - 943
  • [24] On Biasing Transformer Attention Towards Monotonicity
    Rios, Annette
    Amrhein, Chantal
    Aepli, Noemi
    Sennrich, Rico
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 4474 - 4488
  • [25] Efficient transformer tracking with adaptive attention
    Xiao, Dingkun
    Wei, Zhenzhong
    Zhang, Guangjun
    IET COMPUTER VISION, 2024,
  • [26] A Multiscale Visualization of Attention in the Transformer Model
    Vig, Jesse
    PROCEEDINGS OF THE 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, (ACL 2019), 2019, : 37 - +
  • [27] CoAtFormer: Vision Transformer with Composite Attention
    Chang, Zhiyong
    Yin, Mingjun
    Wang, Yan
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 614 - 622
  • [28] Improving Transformer with an Admixture of Attention Heads
    Nguyen, Tan M.
    Tam Nguyen
    Hai Do
    Khai Nguyen
    Saragadam, Vishwanath
    Minh Pham
    Duy Khuong Nguyen
    Nhat Ho
    Osher, Stanley J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [29] Sharing Attention Weights for Fast Transformer
    Xiao, Tong
    Li, Yinqiao
    Zhu, Jingbo
    Yu, Zhengtao
    Liu, Tongran
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5292 - 5298
  • [30] AttentionViz: A Global View of Transformer Attention
    Yeh, Catherine
    Chen, Yida
    Wu, Aoyu
    Chen, Cynthia
    Viegas, Fernanda
    Wattenberg, Martin
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (01) : 262 - 272