ReViT: Enhancing vision transformers feature diversity with attention residual connections

被引:3
|
作者
Diko, Anxhelo [1 ]
Avola, Danilo [1 ]
Cascio, Marco [1 ,2 ]
Cinque, Luigi [1 ]
机构
[1] Sapienza Univ Rome, Dept Comp Sci, Via Salaria 113, I-00198 Rome, Italy
[2] Univ Rome UnitelmaSapienza, Dept Law & Econ, Piazza Sassari 4, I-00161 Rome, Italy
关键词
Vision transformer; Feature collapse; Self-attention mechanism; Residual attention learning; Visual recognition;
D O I
10.1016/j.patcog.2024.110853
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers, resulting in the vanishing of low-level visual features. However, such features can be helpful to accurately represent and identify elements within an image and increase the accuracy and robustness of vision-based recognition systems. Following this rationale, we propose a novel residual attention learning method for improving ViT-based architectures, increasing their visual feature diversity and model robustness. In this way, the proposed network can capture and preserve significant low-level features, providing more details about the elements within the scene being analyzed. The effectiveness and robustness of the presented method are evaluated on five image classification benchmarks, including ImageNet1k, CIFAR10, CIFAR100, Oxford Flowers-102, and Oxford-IIIT Pet, achieving improved performances. Additionally, experiments on the COCO2017 dataset show that the devised approach discovers and incorporates semantic and spatial relationships for object detection and instance segmentation when implemented into spatial-aware transformer models.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Enhanced astronomical source classification with integration of attention mechanisms and vision transformers
    Bhavanam, Srinadh Reddy
    Channappayya, Sumohana S.
    Srijith, P. K.
    Desai, Shantanu
    ASTROPHYSICS AND SPACE SCIENCE, 2024, 369 (08)
  • [42] Introducing Attention Mechanism for EEG Signals: Emotion Recognition with Vision Transformers
    Arjun
    Rajpoot, Aniket Singh
    Panicker, Mahesh Raveendranatha
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 5723 - 5726
  • [43] You Only Need Less Attention at Each Stage in Vision Transformers
    Zhang, Shuoxi
    Liu, Hanpeng
    Lin, Stephen
    He, Kun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6057 - 6066
  • [44] VSA: Learning Varied-Size Window Attention in Vision Transformers
    Zhang, Qiming
    Xu, Yufei
    Zhang, Jing
    Tao, Dacheng
    COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 466 - 483
  • [45] Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
    Sahiner, Arda
    Ergen, Tolga
    Ozturkler, Batu
    Pauly, John
    Mardani, Morteza
    Pilanci, Mert
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19050 - 19088
  • [46] UNestFormer: Enhancing Decoders and Skip Connections With Nested Transformers for Medical Image Segmentation
    Tayeb, Adnan Md
    Kim, Tae-Hyong
    IEEE ACCESS, 2024, 12 : 190996 - 191009
  • [47] Attention-Based Feature Fusion With External Attention Transformers for Breast Cancer Histopathology Analysis
    Vanitha, K.
    Manimaran, A.
    Chokkanathan, K.
    Anitha, K.
    Mahesh, T. R.
    Vinoth Kumar, V.
    Vivekananda, G. N.
    IEEE ACCESS, 2024, 12 : 126296 - 126312
  • [48] Distinct Attention Networks for Feature Enhancement and Suppression in Vision
    Bridwell, David A.
    Srinivasan, Ramesh
    PSYCHOLOGICAL SCIENCE, 2012, 23 (10) : 1151 - 1158
  • [49] PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
    Grainger, Ryan
    Paniagua, Thomas
    Song, Xi
    Cuntoor, Naresh
    Lee, Mun Wai
    Wu, Tianfu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18568 - 18578
  • [50] Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
    Wei, Cong
    Duke, Brendan
    Jiang, Ruowei
    Aarabi, Parham
    Taylor, Graham W.
    Shkurti, Florian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22680 - 22689