ReViT: Vision Transformer Accelerator With Reconfigurable Semantic-Aware Differential Attention

被引:0
|
作者
Zou, Xiaofeng [1 ]
Chen, Cen [1 ,2 ]
Shao, Hongen [1 ]
Wang, Qinyu [1 ]
Zhuang, Xiaobin [1 ]
Li, Yangfan [3 ]
Li, Keqin [4 ]
机构
[1] South China Univ Technol, Sch Future Technol, Guangzhou 510641, Peoples R China
[2] Pazhou Lab, Guangzhou 510335, Peoples R China
[3] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
[4] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantics; Transformers; Visualization; Computer vision; Computational modeling; Attention mechanisms; Dogs; Computers; Snow; Performance evaluation; Hardware accelerator; vision transformers; software-hardware co-design; HIERARCHIES;
D O I
10.1109/TC.2024.3504263
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
While vision transformers (ViTs) have continued to achieve new milestones in computer vision, their complicated network architectures with high computation and memory costs have hindered their deployment on resource-limited edge devices. Some customized accelerators have been proposed to accelerate the execution of ViTs, achieving improved performance with reduced energy consumption. However, these approaches utilize flattened attention mechanisms and ignore the inherent hierarchical visual semantics in images. In this work, we conduct a thorough analysis of hierarchical visual semantics in real-world images, revealing opportunities and challenges of leveraging visual semantics to accelerate ViTs. We propose ReViT, a systematic algorithm and architecture co-design approach, which aims to exploit the visual semantics to accelerate ViTs. Our proposed algorithm can leverage the same semantic class with strong feature similarity to reduce computation and communication in a differential attention mechanism, and support the semantic-aware attention efficiently. A novel dedicated architecture is designed to support the proposed algorithm and translate it into performance improvements. Moreover, we propose an efficient execution dataflow to alleviate workload imbalance and maximize hardware utilization. ReViT opens new directions for accelerating ViTs by exploring the underlying visual semantics of images. ReViT gains an average of 2.3x speedup and 3.6x energy efficiency over state-of-the-art ViT accelerators.
引用
收藏
页码:1079 / 1093
页数:15
相关论文
共 48 条
  • [31] A Semantic-Aware Attention and Visual Shielding Network for Cloth-Changing Person Re-Identification
    Gao, Zan
    Wei, Hongwei
    Guan, Weili
    Nie, Jie
    Wang, Meng
    Chen, Shengyong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1243 - 1257
  • [32] HSACT: A hierarchical semantic-aware CNN-Transformer for remote sensing image spectral super-resolution
    Zhou, Chengle
    He, Zhi
    Zou, Liwei
    Li, Yunfei
    Plaza, Antonio
    NEUROCOMPUTING, 2025, 636
  • [33] An enhanced vision transformer with scale-aware and spatial-aware attention for thighbone fracture detection
    Guan B.
    Yao J.
    Zhang G.
    Neural Computing and Applications, 2024, 36 (19) : 11425 - 11438
  • [34] Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation
    Wu, Zizhang
    Gan, Yuanzhu
    Xu, Tianhao
    Wang, Fan
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (05)
  • [35] Graph-Segmenter: graph transformer with boundary-aware attention for semantic segmentation
    Zizhang Wu
    Yuanzhu Gan
    Tianhao Xu
    Fan Wang
    Frontiers of Computer Science, 2024, 18
  • [36] Linguistically-aware attention for reducing the semantic gap in vision-language tasks
    Gouthaman, K. V.
    Nambiar, Athira
    Srinivas, Kancheti Sai
    Mittal, Anurag
    PATTERN RECOGNITION, 2021, 112
  • [37] DiVIT: Algorithm and architecture co-design of differential attention in vision transformer
    Li, Yangfan
    Hu, Yikun
    Wu, Fan
    Li, Kenli
    JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 128
  • [38] Semantic-aware frame-event fusion based pattern recognition via large vision-language models
    Li, Dong
    Jin, Jiandong
    Zhang, Yuhao
    Zhong, Yanlin
    Wu, Yaoyang
    Chen, Lan
    Wang, Xiao
    Luo, Bin
    PATTERN RECOGNITION, 2025, 158
  • [39] SiamSEA: Semantic-Aware Enhancement and Associative-Attention Dual-Modal Siamese Network for Robust RGBT Tracking
    Zhuang, Zihan
    Yin, Mingfeng
    Gao, Qi
    Lin, Yong
    Hong, Xing
    IEEE ACCESS, 2024, 12 : 134874 - 134887
  • [40] SCTS: Instance Segmentation of Single Cells Using a Transformer-Based Semantic-Aware Model and Space-Filling Augmentation
    Zhou, Yating
    Li, Wenjing
    Yang, Ge
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5933 - 5942