Enhancing the Utilization of Processing Elements in Spatial Deep Neural Network Accelerators

被引:4
|
作者
Asadikouhanjani, Mohammadreza [1 ]
Ko, Seok-Bum [1 ]
机构
[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Dataflow; deep neural network (DNN); negative output feature; processing element (PE); slack time; zero skipping;
D O I
10.1109/TCAD.2020.3031240
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Equipping mobile platforms with deep learning applications is very valuable. Providing healthcare services in remote areas, improving privacy, and lowering needed communication bandwidth are the advantages of such platforms. Designing an efficient computation engine enhances the performance of these platforms while running deep neural networks (DNNs). Energy-efficient DNN accelerators use skipping sparsity and early negative output feature detection to prune the computations. Spatial DNN accelerators in principle can support computation-pruning techniques compared to other common architectures, such as systolic arrays. These accelerators need a separate data distribution fabric like buses or trees with support for high bandwidth to run the mentioned techniques efficiently and avoid network on chip (NoC)-based stalls. Spatial designs suffer from divergence and unequal work distribution. Therefore, applying computation-pruning techniques into a spatial design, which is even equipped with an NoC that supports high bandwidth for the processing elements (PEs), still causes stalls inside the computation engine. In a spatial architecture, the PEs that perform their tasks earlier have a slack time compared to others. In this article, we propose an architecture with a negligible area overhead based on sharing the scratchpads in a novel way between the PEs to use the available slack time caused by applying computation-pruning techniques or the used NoC format. With the use of our dataflow, a spatial engine can benefit from computation-pruning and data reuse techniques more efficiently. When compared to the reference design, our proposed method achieves a speedup of x1.24 and an energy efficiency of x1.18 per inference.
引用
收藏
页码:1947 / 1951
页数:5
相关论文
共 50 条
  • [1] Optimizing Energy Utilization of Flexible Deep Neural Network Accelerators via Cache Incorporation
    Hensley, Dalton
    Zhang, Wei
    2022 IEEE 19TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2022), 2022, : 681 - 686
  • [2] Review of ASIC accelerators for deep neural network
    Machupalli, Raju
    Hossain, Masum
    Mandal, Mrinal
    MICROPROCESSORS AND MICROSYSTEMS, 2022, 89
  • [3] Approximate Adders for Deep Neural Network Accelerators
    Raghuram, S.
    Shashank, N.
    2022 35TH INTERNATIONAL CONFERENCE ON VLSI DESIGN (VLSID 2022) HELD CONCURRENTLY WITH 2022 21ST INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (ES 2022), 2022, : 210 - 215
  • [4] Improving Memory Utilization in Convolutional Neural Network Accelerators
    Jokic, Petar
    Emery, Stephane
    Benini, Luca
    IEEE EMBEDDED SYSTEMS LETTERS, 2021, 13 (03) : 77 - 80
  • [5] Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization
    Qiao, Siyuan
    Lin, Zhe
    Zhang, Jianming
    Yuille, Alan
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 61 - 71
  • [6] Enhancing the Utilization of Dot-Product Engines in Deep Learning Accelerators
    Soliman, Taha
    Runge, Armin
    Ecco, Leonardo
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 840 - 843
  • [7] A Survey on Memory Subsystems for Deep Neural Network Accelerators
    Asad, Arghavan
    Kaur, Rupinder
    Mohammadi, Farah
    FUTURE INTERNET, 2022, 14 (05):
  • [8] Dynamic Precision Multiplier For Deep Neural Network Accelerators
    Ding, Chen
    Yuxiang, Huan
    Zheng, Lirong
    Zou, Zhuo
    2020 IEEE 33RD INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC), 2020, : 180 - 184
  • [9] An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators
    Nabavinejad, Seyed Morteza
    Baharloo, Mohammad
    Chen, Kun-Chih
    Palesi, Maurizio
    Kogel, Tim
    Ebrahimi, Masoumeh
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2020, 10 (03) : 268 - 282
  • [10] BenQ: Benchmarking Automated Quantization on Deep Neural Network Accelerators
    Wei, Zheng
    Zhang, Xingjun
    Li, Jingbo
    Ji, Zeyu
    Wei, Jia
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 1479 - 1484