Adaptive search for broad attention based vision transformers

被引:0
|
作者
Li, Nannan [1 ,2 ,3 ]
Chen, Yaran [1 ,2 ]
Zhao, Dongbin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Tsinghua Univ, Dept Automat, BNRIST, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Vision transformer; Adaptive architecture search; Broad search space; Image classification; Broad learning;
D O I
10.1016/j.neucom.2024.128696
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer (ViT) has prevailed among computer vision tasks for its powerful capability of image representation recently. Frustratingly, the manual design of efficient architectures for ViTs can be laborious, often involving repetitive trial and error processes. Furthermore, the exploration of lightweight ViTs remains limited, resulting in inferior performance compared to convolutional neural networks. To tackle these challenges, we propose Adaptive Search for Broad attention based Vision Transformers, called ASB, which automates the design of efficient ViT architectures by utilizing the broad search space and an adaptive evolutionary algorithm. The broad search space facilitates the exploration of a novel connection paradigm, enabling more comprehensive integration of attention information to improve ViT performance. Additionally, an efficient adaptive evolutionary algorithm is developed to efficiently explore architectures by dynamically learning the probability distribution of candidate operators. Our experimental results demonstrate that the adaptive evolution in ASB efficiently learns excellent lightweight models, achieving a 55% improvement in convergence speed over traditional evolutionary algorithms. Moreover, the effectiveness of ASB is validated across several visual tasks. For instance, on ImageNet classification, the searched model attains a performance of 77.8% with 6.5M parameters and outperforms state-of-the-art models, including EfficientNet and EfficientViT networks. On mobile COCO panoptic segmentation, our approach delivers 43.7% PQ. On mobile ADE20K semantic segmentation, our method attains 40.9% mIoU. The code and pre-trained models will be available soon in ASB-Code.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Introducing Attention Mechanism for EEG Signals: Emotion Recognition with Vision Transformers
    Arjun
    Rajpoot, Aniket Singh
    Panicker, Mahesh Raveendranatha
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 5723 - 5726
  • [32] You Only Need Less Attention at Each Stage in Vision Transformers
    Zhang, Shuoxi
    Liu, Hanpeng
    Lin, Stephen
    He, Kun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6057 - 6066
  • [33] ReViT: Enhancing vision transformers feature diversity with attention residual connections
    Diko, Anxhelo
    Avola, Danilo
    Cascio, Marco
    Cinque, Luigi
    PATTERN RECOGNITION, 2024, 156
  • [34] VSA: Learning Varied-Size Window Attention in Vision Transformers
    Zhang, Qiming
    Xu, Yufei
    Zhang, Jing
    Tao, Dacheng
    COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 466 - 483
  • [35] Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
    Sahiner, Arda
    Ergen, Tolga
    Ozturkler, Batu
    Pauly, John
    Mardani, Morteza
    Pilanci, Mert
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19050 - 19088
  • [36] Presentation attack detection based on two-stream vision transformers with self-attention fusion
    Peng, Fei
    Meng, Shao-hua
    Long, Min
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 85
  • [37] Optimizing Strawberry Disease and Quality Detection with Vision Transformers and Attention-Based Convolutional Neural Networks
    Aghamohammadesmaeilketabforoosh, Kimia
    Nikan, Soodeh
    Antonini, Giorgio
    Pearce, Joshua M.
    FOODS, 2024, 13 (12)
  • [38] Driver attention prediction based on convolution and transformers
    Gou, Chao
    Zhou, Yuchen
    Li, Dan
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (06): : 8268 - 8284
  • [39] Driver attention prediction based on convolution and transformers
    Chao Gou
    Yuchen Zhou
    Dan Li
    The Journal of Supercomputing, 2022, 78 : 8268 - 8284
  • [40] PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
    Grainger, Ryan
    Paniagua, Thomas
    Song, Xi
    Cuntoor, Naresh
    Lee, Mun Wai
    Wu, Tianfu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18568 - 18578