Enhancing the Utilization of Processing Elements in Spatial Deep Neural Network Accelerators

被引:4
|
作者
Asadikouhanjani, Mohammadreza [1 ]
Ko, Seok-Bum [1 ]
机构
[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Dataflow; deep neural network (DNN); negative output feature; processing element (PE); slack time; zero skipping;
D O I
10.1109/TCAD.2020.3031240
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Equipping mobile platforms with deep learning applications is very valuable. Providing healthcare services in remote areas, improving privacy, and lowering needed communication bandwidth are the advantages of such platforms. Designing an efficient computation engine enhances the performance of these platforms while running deep neural networks (DNNs). Energy-efficient DNN accelerators use skipping sparsity and early negative output feature detection to prune the computations. Spatial DNN accelerators in principle can support computation-pruning techniques compared to other common architectures, such as systolic arrays. These accelerators need a separate data distribution fabric like buses or trees with support for high bandwidth to run the mentioned techniques efficiently and avoid network on chip (NoC)-based stalls. Spatial designs suffer from divergence and unequal work distribution. Therefore, applying computation-pruning techniques into a spatial design, which is even equipped with an NoC that supports high bandwidth for the processing elements (PEs), still causes stalls inside the computation engine. In a spatial architecture, the PEs that perform their tasks earlier have a slack time compared to others. In this article, we propose an architecture with a negligible area overhead based on sharing the scratchpads in a novel way between the PEs to use the available slack time caused by applying computation-pruning techniques or the used NoC format. With the use of our dataflow, a spatial engine can benefit from computation-pruning and data reuse techniques more efficiently. When compared to the reference design, our proposed method achieves a speedup of x1.24 and an energy efficiency of x1.18 per inference.
引用
收藏
页码:1947 / 1951
页数:5
相关论文
共 50 条
  • [21] LISA: Graph Neural Network based Portable Mapping on Spatial Accelerators
    Li, Zhaoying
    Wu, Dan
    Wijerathne, Dhananjaya
    Mitra, Tulika
    2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 444 - 459
  • [22] Spatial Data Dependence Graph Simulator for Convolutional Neural Network Accelerators
    Wang, Jooho
    Kim, Jiwon
    Moon, Sungmin
    Kim, Sunwoo
    Park, Sungkyung
    Park, Chester Sungchung
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 309 - 310
  • [23] Joint Protection Scheme for Deep Neural Network Hardware Accelerators and Models
    Zhou, Jingbo
    Zhang, Xinmiao
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4518 - 4527
  • [24] Adaptable Approximation Based on Bit Decomposition for Deep Neural Network Accelerators
    Soliman, Taha
    De la Parra, Cecilia
    Guntoro, Andre
    Wehn, Norbert
    2021 IEEE 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS), 2021,
  • [25] A Novel Heuristic Neuron Grouping Algorithm for Deep Neural Network Accelerators
    Cakin, Alperen
    Dilek, Selma
    Tosun, Suleyman
    Nacar, Furkan
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2025,
  • [26] Soft Error Mitigation for Deep Convolution Neural Network on FPGA Accelerators
    Li, Wenshuo
    Ge, Guangjun
    Guo, Kaiyuan
    Chen, Xiaoming
    Wei, Qi
    Gao, Zhen
    Wang, Yu
    Yang, Huazhong
    2020 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2020), 2020, : 1 - 5
  • [27] Mapping of Deep Neural Network Accelerators on Wireless Multistage Interconnection NoCs
    Aydi, Yassine
    Mnejja, Sirine
    Mohammed, Faraqid Q.
    Abid, Mohamed
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [28] DNNZip: Selective Layers Compression Technique in Deep Neural Network Accelerators
    Landhiri, Habiba
    Palesi, Maurizio
    Monteleone, Salvatore
    Patti, Davide
    Ascia, Giuseppe
    Lorandel, Jordane
    Bourdel, Emmanuelle
    Catania, Vincenzo
    2020 23RD EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2020), 2020, : 526 - 533
  • [29] Optimizing deep learning inference on mobile devices with neural network accelerators
    曾惜
    Xu Yunlong
    Zhi Tian
    High Technology Letters, 2019, 25 (04) : 417 - 425
  • [30] Quantization-Error-Robust Deep Neural Network for Embedded Accelerators
    Jung, Youngbeom
    Kim, Hyeonuk
    Choi, Yeongjae
    Kim, Lee-Sup
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (02) : 609 - 613