Adder Attention for Vision Transformer

被引:0
|
作者
Shu, Han [1 ]
Wang, Jiahao [2 ]
Chen, Hanting [1 ,3 ]
Li, Lin [4 ]
Yang, Yujiu [2 ]
Wang, Yunhe [1 ]
机构
[1] Huawei Noahs Ark Lab, Hong Kong, Peoples R China
[2] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
[4] Huawei Technol, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer is a new kind of calculation paradigm for deep learning which has shown strong performance on a large variety of computer vision tasks. However, compared with conventional deep models (e.g., convolutional neural networks), vision transformers require more computational resources which cannot be easily deployed on mobile devices. To this end, we present to reduce the energy consumptions using adder neural network (AdderNet). We first theoretically analyze the mechanism of self-attention and the difficulty for applying adder operation into this module. Specifically, the feature diversity, i.e., the rank of attention map using only additions cannot be well preserved. Thus, we develop an adder attention layer that includes an additional identity mapping. With the new operation, vision transformers constructed using additions can also provide powerful feature representations. Experimental results on several benchmarks demonstrate that the proposed approach can achieve highly competitive performance to that of the baselines while achieving an about 2(similar to)3x reduction on the energy consumption.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] PARAMETER-EFFICIENT VISION TRANSFORMER WITH LINEAR ATTENTION
    Zhao, Youpeng
    Tang, Huadong
    Jiang, Yingying
    Yong, A.
    Wu, Qiang
    Wang, Jun
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1275 - 1279
  • [12] Fcaformer: Forward Cross Attention in Hybrid Vision Transformer
    Zhang, Haokui
    Hu, Wenze
    Wang, Xiaoyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6037 - 6046
  • [13] BViT: Broad Attention-Based Vision Transformer
    Li, Nannan
    Chen, Yaran
    Li, Weifan
    Ding, Zixiang
    Zhao, Dongbin
    Nie, Shuai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12772 - 12783
  • [14] Efficient image analysis with triple attention vision transformer
    Li, Gehui
    Zhao, Tongtong
    PATTERN RECOGNITION, 2024, 150
  • [15] CONMW TRANSFORMER: A GENERAL VISION TRANSFORMER BACKBONE WITH MERGED-WINDOW ATTENTION
    Li, Ang
    Jiao, Jichao
    Li, Ning
    Qi, Wangjing
    Xu, Wei
    Pang, Min
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1551 - 1555
  • [16] Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
    Pan, Xuran
    Ye, Tianzhu
    Xia, Zhuofan
    Song, Shiji
    Huang, Gao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2082 - 2091
  • [17] Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
    Wu, Sitong
    Wu, Tianyi
    Tan, Haoru
    Guo, Guodong
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2731 - 2739
  • [18] FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification
    Yoo, Dayeon
    Kim, Jeesu
    Yoo, Jinwoo
    IEEE ACCESS, 2024, 12 : 72598 - 72606
  • [19] BiFormer: Vision Transformer with Bi-Level Routing Attention
    Zhu, Lei
    Wang, Xinjiang
    Ke, Zhanghan
    Zhang, Wayne
    Lau, Rynson
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10323 - 10333
  • [20] SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
    Vani, Ankit
    Nguyen, Bac
    Lavoie, Samuel
    Krishna, Ranjay
    Courville, Aaron
    COMPUTER VISION - ECCV 2024, PT LXVI, 2025, 15124 : 233 - 251