Enhancing the Utilization of Processing Elements in Spatial Deep Neural Network Accelerators

被引:4
|
作者
Asadikouhanjani, Mohammadreza [1 ]
Ko, Seok-Bum [1 ]
机构
[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Dataflow; deep neural network (DNN); negative output feature; processing element (PE); slack time; zero skipping;
D O I
10.1109/TCAD.2020.3031240
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Equipping mobile platforms with deep learning applications is very valuable. Providing healthcare services in remote areas, improving privacy, and lowering needed communication bandwidth are the advantages of such platforms. Designing an efficient computation engine enhances the performance of these platforms while running deep neural networks (DNNs). Energy-efficient DNN accelerators use skipping sparsity and early negative output feature detection to prune the computations. Spatial DNN accelerators in principle can support computation-pruning techniques compared to other common architectures, such as systolic arrays. These accelerators need a separate data distribution fabric like buses or trees with support for high bandwidth to run the mentioned techniques efficiently and avoid network on chip (NoC)-based stalls. Spatial designs suffer from divergence and unequal work distribution. Therefore, applying computation-pruning techniques into a spatial design, which is even equipped with an NoC that supports high bandwidth for the processing elements (PEs), still causes stalls inside the computation engine. In a spatial architecture, the PEs that perform their tasks earlier have a slack time compared to others. In this article, we propose an architecture with a negligible area overhead based on sharing the scratchpads in a novel way between the PEs to use the available slack time caused by applying computation-pruning techniques or the used NoC format. With the use of our dataflow, a spatial engine can benefit from computation-pruning and data reuse techniques more efficiently. When compared to the reference design, our proposed method achieves a speedup of x1.24 and an energy efficiency of x1.18 per inference.
引用
收藏
页码:1947 / 1951
页数:5
相关论文
共 50 条
  • [41] High-level Modeling of Manufacturing Faults in Deep Neural Network Accelerators
    Kundu, Shamik
    Soyyigit, Ahmet
    Hoque, Khaza Anuarul
    Basu, Kanad
    2020 26TH IEEE INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS 2020), 2020,
  • [42] CANN: Curable Approximations for High-Performance Deep Neural Network Accelerators
    Hanif, Muhammad Abdullah
    Khalid, Faiq
    Shafique, Muhammad
    PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [43] Reconfigurable Multi-Input Adder Design for Deep Neural Network Accelerators
    Moradian, Hossein
    Jo, Sujeong
    Choi, Kiyoung
    2018 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2018, : 212 - 213
  • [44] Memory Bandwidth and Energy Efficiency Optimization of Deep Convolutional Neural Network Accelerators
    Nie, Zikai
    Li, Zhisheng
    Wang, Lei
    Guo, Shasha
    Dou, Qiang
    ADVANCED COMPUTER ARCHITECTURE, 2018, 908 : 15 - 29
  • [45] Attacking a Joint Protection Scheme for Deep Neural Network Hardware Accelerators and Models
    Wilhelmstaetter, Simon
    Conrad, Joschua
    Upadhyaya, Devanshi
    Polian, Ilia
    Ortmanns, Maurits
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 144 - 148
  • [46] Efficient Compression Technique for NoC-based Deep Neural Network Accelerators
    Lorandel, Jordane
    Lahdhiri, Habiba
    Bourdel, Emmanuelle
    Monteleone, Salvatore
    Palesi, Maurizio
    2020 23RD EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2020), 2020, : 174 - 179
  • [47] Efficient On-Line Error Detection and Mitigation for Deep Neural Network Accelerators
    Schorn, Christoph
    Guntoro, Andre
    Ascheid, Gerd
    COMPUTER SAFETY, RELIABILITY, AND SECURITY (SAFECOMP 2018), 2018, 11093 : 205 - 219
  • [48] VLSI implementation of transcendental function hyperbolic tangent for deep neural network accelerators
    Rajput, Gunjan
    Raut, Gopal
    Chandra, Mahesh
    Vishvakarma, Santosh Kumar
    MICROPROCESSORS AND MICROSYSTEMS, 2021, 84
  • [49] Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications
    Li, Guanpeng
    Hari, Siva Kumar Sastry
    Sullivan, Michael
    Tsai, Timothy
    Pattabiraman, Karthik
    Emer, Joel
    Keckler, Stephen W.
    SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,
  • [50] Pico-Programmable Neurons to Reduce Computations for Deep Neural Network Accelerators
    Nahvy, Alireza
    Navabi, Zainalabedin
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2024, 32 (07) : 1216 - 1227