Enhancing the Utilization of Processing Elements in Spatial Deep Neural Network Accelerators

被引:4
|
作者
Asadikouhanjani, Mohammadreza [1 ]
Ko, Seok-Bum [1 ]
机构
[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Dataflow; deep neural network (DNN); negative output feature; processing element (PE); slack time; zero skipping;
D O I
10.1109/TCAD.2020.3031240
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Equipping mobile platforms with deep learning applications is very valuable. Providing healthcare services in remote areas, improving privacy, and lowering needed communication bandwidth are the advantages of such platforms. Designing an efficient computation engine enhances the performance of these platforms while running deep neural networks (DNNs). Energy-efficient DNN accelerators use skipping sparsity and early negative output feature detection to prune the computations. Spatial DNN accelerators in principle can support computation-pruning techniques compared to other common architectures, such as systolic arrays. These accelerators need a separate data distribution fabric like buses or trees with support for high bandwidth to run the mentioned techniques efficiently and avoid network on chip (NoC)-based stalls. Spatial designs suffer from divergence and unequal work distribution. Therefore, applying computation-pruning techniques into a spatial design, which is even equipped with an NoC that supports high bandwidth for the processing elements (PEs), still causes stalls inside the computation engine. In a spatial architecture, the PEs that perform their tasks earlier have a slack time compared to others. In this article, we propose an architecture with a negligible area overhead based on sharing the scratchpads in a novel way between the PEs to use the available slack time caused by applying computation-pruning techniques or the used NoC format. With the use of our dataflow, a spatial engine can benefit from computation-pruning and data reuse techniques more efficiently. When compared to the reference design, our proposed method achieves a speedup of x1.24 and an energy efficiency of x1.18 per inference.
引用
收藏
页码:1947 / 1951
页数:5
相关论文
共 50 条
  • [31] Compute-in-Time for Deep Neural Network Accelerators: Challenges and Prospects
    Al Maharmeh, Hamza
    Sarhan, Nabil J.
    Hung, Chung-Chih
    Ismail, Mohammed
    Alhawari, Mohammad
    2020 IEEE 63RD INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2020, : 990 - 993
  • [32] USING DATAFLOW TO OPTIMIZE ENERGY EFFICIENCY OF DEEP NEURAL NETWORK ACCELERATORS
    Chen, Yu-Hsin
    Emer, Joel
    Sze, Vivienne
    IEEE MICRO, 2017, 37 (03) : 12 - 21
  • [33] Deep Geometric Neural Network for Spatial Interpolation
    Zhang, Minxing
    Yu, Dazhou
    Li, Yun
    Zhao, Liang
    30TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS, ACM SIGSPATIAL GIS 2022, 2022, : 516 - 519
  • [34] Design Considerations for Efficient Deep Neural Networks on Processing-in-Memory Accelerators
    Yang, Tien-Ju
    Sze, Vivienne
    2019 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2019,
  • [35] Acoustic Events Processing with Deep Neural Network
    Conka, David
    Cizmar, Anton
    2019 29TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2019, : 228 - 231
  • [36] Deep neural network processing of DEER data
    Worswick, Steven G.
    Spencer, James A.
    Jeschke, Gunnar
    Kuprov, Ilya
    SCIENCE ADVANCES, 2018, 4 (08):
  • [37] A survey of neural network accelerators
    Li, Zhen
    Wang, Yuqing
    Zhi, Tian
    Chen, Tianshi
    FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (05) : 746 - 761
  • [38] A survey of neural network accelerators
    Zhen Li
    Yuqing Wang
    Tian Zhi
    Tianshi Chen
    Frontiers of Computer Science, 2017, 11 : 746 - 761
  • [39] Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators
    Pogue, Trevor E.
    Nicolici, Nicola
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (02) : 495 - 509
  • [40] Exploration and Generation of Efficient FPGA-based Deep Neural Network Accelerators
    Ali, Nermine
    Philippe, Jean-Marc
    Tain, Benoit
    Coussy, Philippe
    2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), 2021, : 123 - 128