Adder Attention for Vision Transformer

被引:0
|
作者
Shu, Han [1 ]
Wang, Jiahao [2 ]
Chen, Hanting [1 ,3 ]
Li, Lin [4 ]
Yang, Yujiu [2 ]
Wang, Yunhe [1 ]
机构
[1] Huawei Noahs Ark Lab, Hong Kong, Peoples R China
[2] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[3] Peking Univ, Beijing, Peoples R China
[4] Huawei Technol, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer is a new kind of calculation paradigm for deep learning which has shown strong performance on a large variety of computer vision tasks. However, compared with conventional deep models (e.g., convolutional neural networks), vision transformers require more computational resources which cannot be easily deployed on mobile devices. To this end, we present to reduce the energy consumptions using adder neural network (AdderNet). We first theoretically analyze the mechanism of self-attention and the difficulty for applying adder operation into this module. Specifically, the feature diversity, i.e., the rank of attention map using only additions cannot be well preserved. Thus, we develop an adder attention layer that includes an additional identity mapping. With the new operation, vision transformers constructed using additions can also provide powerful feature representations. Experimental results on several benchmarks demonstrate that the proposed approach can achieve highly competitive performance to that of the baselines while achieving an about 2(similar to)3x reduction on the energy consumption.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Patch attention convolutional vision transformer for facial expression recognition with occlusion
    Liu, Chang
    Hirota, Kaoru
    Dai, Yaping
    INFORMATION SCIENCES, 2023, 619 : 781 - 794
  • [32] Colorectal Polyp Segmentation Combining Pyramid Vision Transformer and Axial Attention
    Zhou, Xue
    Bai, Zhengyao
    Lu, Qianjie
    Fan, Shenglan
    Computer Engineering and Applications, 2023, 59 (11) : 222 - 230
  • [33] Hierarchical attention vision transformer for fine-grained visual classification
    Hu, Xiaobin
    Zhu, Shining
    Peng, Taile
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 91
  • [34] Facial Expression Recognition Based on Vision Transformer with Hybrid Local Attention
    Tian, Yuan
    Zhu, Jingxuan
    Yao, Huang
    Chen, Di
    APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [35] ASAFormer: Visual tracking with convolutional vision transformer and asymmetric selective attention
    Gong, Xiaomei
    Zhang, Yi
    Hu, Shu
    Knowledge-Based Systems, 2024, 291
  • [36] Lightweight Vision Transformer with Spatial and Channel Enhanced Self-Attention
    Zheng, Jiahao
    Yang, Longqi
    Li, Yiying
    Yang, Ke
    Wang, Zhiyuan
    Zhou, Jun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 1484 - 1488
  • [37] DHFormer: A Vision Transformer-Based Attention Module for Image Dehazing
    Wasi, Abdul
    Shiney, O. Jeba
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 : 148 - 159
  • [38] A novel twin vision transformer framework for crop disease classification with deformable attention
    Padshetty, Smitha
    Ambika
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 105
  • [39] EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention
    Shi, Yulong
    Sun, Mingwei
    Wang, Yongshuai
    Ma, Jiahao
    Chen, Zengqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2025, 55 (03) : 1288 - 1300
  • [40] A novel hybrid attention gate based on vision transformer for the detection of surface defects
    Uzen, Hueseyin
    Turkoglu, Muammer
    Ozturk, Dursun
    Hanbay, Davut
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 6835 - 6851