ReViT: Vision Transformer Accelerator With Reconfigurable Semantic-Aware Differential Attention

被引:0
|
作者
Zou, Xiaofeng [1 ]
Chen, Cen [1 ,2 ]
Shao, Hongen [1 ]
Wang, Qinyu [1 ]
Zhuang, Xiaobin [1 ]
Li, Yangfan [3 ]
Li, Keqin [4 ]
机构
[1] South China Univ Technol, Sch Future Technol, Guangzhou 510641, Peoples R China
[2] Pazhou Lab, Guangzhou 510335, Peoples R China
[3] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
[4] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantics; Transformers; Visualization; Computer vision; Computational modeling; Attention mechanisms; Dogs; Computers; Snow; Performance evaluation; Hardware accelerator; vision transformers; software-hardware co-design; HIERARCHIES;
D O I
10.1109/TC.2024.3504263
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
While vision transformers (ViTs) have continued to achieve new milestones in computer vision, their complicated network architectures with high computation and memory costs have hindered their deployment on resource-limited edge devices. Some customized accelerators have been proposed to accelerate the execution of ViTs, achieving improved performance with reduced energy consumption. However, these approaches utilize flattened attention mechanisms and ignore the inherent hierarchical visual semantics in images. In this work, we conduct a thorough analysis of hierarchical visual semantics in real-world images, revealing opportunities and challenges of leveraging visual semantics to accelerate ViTs. We propose ReViT, a systematic algorithm and architecture co-design approach, which aims to exploit the visual semantics to accelerate ViTs. Our proposed algorithm can leverage the same semantic class with strong feature similarity to reduce computation and communication in a differential attention mechanism, and support the semantic-aware attention efficiently. A novel dedicated architecture is designed to support the proposed algorithm and translate it into performance improvements. Moreover, we propose an efficient execution dataflow to alleviate workload imbalance and maximize hardware utilization. ReViT opens new directions for accelerating ViTs by exploring the underlying visual semantics of images. ReViT gains an average of 2.3x speedup and 3.6x energy efficiency over state-of-the-art ViT accelerators.
引用
收藏
页码:1079 / 1093
页数:15
相关论文
共 48 条
  • [1] Semantic-aware Transformer for shadow detection
    Zhou, Kai
    Fang, Jing-Long
    Wu, Wen
    Shao, Yan-Li
    Wang, Xing-Qi
    Wei, Dan
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 240
  • [2] Semantic-Aware Dynamic Parameter for Video Inpainting Transformer
    Lee, Eunhye
    Yoo, Jinsu
    Yang, Yunjeong
    Baik, Sungyong
    Kim, Tae Hyun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12903 - 12912
  • [3] SMART: Semantic-Aware Masked Attention Relational Transformer for Multi-label Image Recognition
    Wu, Hongjun
    Xu, Cheng
    Liu, Hongzhe
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2158 - 2162
  • [4] Multi-level semantic-aware transformer for image captioning
    Xu, Qin
    Song, Shan
    Wu, Qihang
    Jiang, Bo
    Luo, Bin
    Tang, Jinhui
    NEURAL NETWORKS, 2025, 187
  • [5] SaTransformer: Semantic-aware transformer for breast cancer classification and segmentation
    Zhang, Jie
    Zhang, Zhichao
    Liu, Hua
    Xu, Shiqiang
    IET IMAGE PROCESSING, 2023, 17 (13) : 3789 - 3800
  • [6] Attention-Aware and Semantic-Aware Network for RGB-D Indoor Semantic Segmentation
    Duan L.-J.
    Sun Q.-C.
    Qiao Y.-H.
    Chen J.-C.
    Cui G.-Q.
    Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (02): : 275 - 291
  • [7] Hardware Accelerator for MobileViT Vision Transformer with Reconfigurable Computation
    Hsiao, Shen-Fu
    Chao, Tzu-Hsien
    Yuan, Yen-Che
    Chen, Kun-Chih
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [8] SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image Prediction
    Li, Zekun
    Liu, Yufan
    Li, Bing
    Feng, Bailan
    Wu, Kebin
    Peng, Chengwei
    Hu, Weiming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6160 - 6173
  • [9] SDPT: Semantic-Aware Dimension-Pooling Transformer for Image Segmentation
    Cao, Hu
    Chen, Guang
    Zhao, Hengshuang
    Jiang, Dongsheng
    Zhang, Xiaopeng
    Tian, Qi
    Knoll, Alois
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 15934 - 15946
  • [10] SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal
    Sun, Zhaoyang
    Chen, Yaxiong
    Xiong, Shengwu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2325 - 2334