Adder Attention for Vision Transformer

被引:0
|
作者
Shu, Han [1 ]
Wang, Jiahao [2 ]
Chen, Hanting [1 ,3 ]
Li, Lin [4 ]
Yang, Yujiu [2 ]
Wang, Yunhe [1 ]
机构
[1] Huawei Noahs Ark Lab, Hong Kong, Peoples R China
[2] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
[4] Huawei Technol, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer is a new kind of calculation paradigm for deep learning which has shown strong performance on a large variety of computer vision tasks. However, compared with conventional deep models (e.g., convolutional neural networks), vision transformers require more computational resources which cannot be easily deployed on mobile devices. To this end, we present to reduce the energy consumptions using adder neural network (AdderNet). We first theoretically analyze the mechanism of self-attention and the difficulty for applying adder operation into this module. Specifically, the feature diversity, i.e., the rank of attention map using only additions cannot be well preserved. Thus, we develop an adder attention layer that includes an additional identity mapping. With the new operation, vision transformers constructed using additions can also provide powerful feature representations. Experimental results on several benchmarks demonstrate that the proposed approach can achieve highly competitive performance to that of the baselines while achieving an about 2(similar to)3x reduction on the energy consumption.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] ViTVO: Vision Transformer based Visual Odometry with Attention Supervision
    Chiu, Chu-Chi
    Yang, Hsuan-Kung
    Chen, Hao-Wei
    Chen, Yu-Wen
    Lee, Chun-Yi
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [22] EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
    Liu, Xinyu
    Peng, Houwen
    Zheng, Ningxin
    Yang, Yuqing
    Hu, Han
    Yuan, Yixuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14420 - 14430
  • [23] An Arrhythmia Classification Model Based on Vision Transformer with Deformable Attention
    Dong, Yanfang
    Zhang, Miao
    Qiu, Lishen
    Wang, Lirong
    Yu, Yong
    MICROMACHINES, 2023, 14 (06)
  • [24] Vision Transformer Based on Reconfigurable Gaussian Self-attention
    Zhao L.
    Zhou J.-K.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1976 - 1988
  • [25] Patch Attacks on Vision Transformer via Skip Attention Gradients
    Deng, Haoyu
    Fang, Yanmei
    Huang, Fangjun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 554 - 567
  • [26] FAM: Improving columnar vision transformer with feature attention mechanism
    Huang, Lan
    Bai, Xingyu
    Zeng, Jia
    Yu, Mengqiang
    Pang, Wei
    Wang, Kangping
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 242
  • [27] Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
    Leem, Saebom
    Seo, Hyunseok
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 2956 - 2964
  • [28] HaViT: Hybrid-Attention Based Vision Transformer for Video Classification
    Li, Li
    Zhuang, Liansheng
    Gao, Shenghua
    Wang, Shafei
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 502 - 517
  • [29] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
    Gong, Xiaomei
    Zhang, Yi
    Hu, Shu
    KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [30] CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION
    Wang, Wenxiao
    Yao, Lu
    Chen, Long
    Lin, Binbin
    Cai, Deng
    He, Xiaofei
    Liu, Wei
    ICLR 2022 - 10th International Conference on Learning Representations, 2022,