A power efficiency enhancements of a multi-bit accelerator for memory prohibitive deep neural networks

被引:6
|
作者
Shivapakash S. [1 ]
Jain H. [2 ]
Hellwich O. [2 ]
Gerfers F. [1 ]
机构
[1] Department of Computer Engineering and Microelectronics, Chair of Mixed Signal Circuit Design, Technical University of Berlin, Berlin
[2] Department of Computer Engineering and Microelectronics, Computer Vision and Remote Sensing, Technical University of Berlin, Berlin
关键词
AlexNet; ASIC; Deep neural network; EfficientNet; FPGA; MobileNet; multi-bit accelerator; SqueezeNet; truncation;
D O I
10.1109/OJCAS.2020.3047225
中图分类号
学科分类号
摘要
Convolutional Neural Networks (CNN) are widely employed in the contemporary artificial intelligence systems. However these models have millions of connections between the layers, that are both memory prohibitive and computationally expensive. Employing these models on an embedded mobile application is resource limited with high power consumption and significant bandwidth requirement to access the data from the off-chip DRAM. Reducing the data movement between the on-chip and off-chip DRAM is the main criteria to achieve high throughput and overall better energy efficiency. Our proposed multi-bit accelerator achieves these goals by employing the truncation of the partial sum (Psum) results of the preceding layer before feeding it into the next layer. We exhibit the architecture by inferencing 32-bits for the first convolution layers and sequentially truncate the bits on the MSB/LSB of integer and fractional part without any further training on the original network. At the last fully connected layer, the top-1 accuracy is maintained with the reduced bit width of 14 and top-5 accuracy upto 10-bit width. The computation engine consists of an systolic array of 1024 processing elements (PE). Large CNNs such as AlexNet, MobileNet, SqueezeNet and EfficientNet were used as benchmark CNN model and Virtex Ultrascale FPGA was used to test the architecture. The proposed truncation scheme has 49% power reduction and resource utilization was reduced by 73.25% for LUTs (Look-up tables), 68.76% for FFs (Flip-Flops), 74.60% for BRAMs (Block RAMs) and 79.425% for Digital Signal Processors (DSPs) when compared with the 32 bits architecture. The design achieves a performance of 223.69 GOPS on a Virtex Ultrascale FPGA, the design has a overall gain of 3.63 × throughput when compared to other prior FPGA accelerators. In addition, the overall power consumption is 4.5 × lower when compared to other prior architectures. The ASIC version of the accelerator was designed in 22nm FDSOI CMOS process to achieve a overall throughput of 2.03 TOPS/W with a total power consumption of 791 mW and with a area of 1 mm ×, 1.2 mm. © 2020 IEEE.
引用
收藏
页码:156 / 169
页数:13
相关论文
共 50 条
  • [21] Demonstration and modeling of multi-bit resistance random access memory
    Yang, Xiang
    Chen, Albert B. K.
    Choi, Byung Joon
    Chen, I-Wei
    APPLIED PHYSICS LETTERS, 2013, 102 (04)
  • [22] Area-efficient ferroelectric multi-bit memory device
    Kim, Woo Young
    MICROELECTRONIC ENGINEERING, 2018, 194 : 61 - 66
  • [23] Vertical multi-bit resonant tunneling diode memory cell
    vanderWagt, JPA
    Tang, H
    Broekaert, TPE
    Kao, YC
    Beam, EA
    1996 54TH ANNUAL DEVICE RESEARCH CONFERENCE DIGEST, 1996, : 168 - 169
  • [24] The multiple fluorescent multi-bit DNA memory encoding system
    Nergui, Navchtsetseg
    Kim, Jongdo
    Lim, Doyeon
    Lee, Wonjin
    Kang, Taeseok
    Kim, Sejung
    Shim, Min Suk
    Song, Youngjun
    NANO COMMUNICATION NETWORKS, 2024, 39
  • [25] Hybrid Hardware/Software Detection of Multi-Bit Upsets in Memory
    Thunig, Robin
    Borchert, Christoph
    Kober, Urs
    Schirmeier, Horst
    2024 54TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS, DSN-W 2024, 2024, : 94 - 97
  • [26] Multi-bit MRAM based high performance neuromorphic accelerator for image classification
    Verma, Gaurav
    Soni, Sandeep
    Nisar, Arshid
    Kaushik, Brajesh Kumar
    NEUROMORPHIC COMPUTING AND ENGINEERING, 2024, 4 (01):
  • [27] Training multi-bit Spiking Neural Network with Virtual Neurons
    Xu, Haoran
    Gu, Zonghua
    Sun, Ruimin
    Ma, De
    NEUROCOMPUTING, 2025, 634
  • [28] A Low Power Multi-Bit Passive ΔΣ Modulator for Wearable Devices
    Maruyama, Naoya
    Komatsu, Satoshi
    2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 75 - 79
  • [29] TiM-DNN: Ternary In-Memory Accelerator for Deep Neural Networks
    Jain, Shubham
    Gupta, Sumeet Kumar
    Raghunathan, Anand
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (07) : 1567 - 1577
  • [30] In-Memory Computing Based Hardware Accelerator Module for Deep Neural Networks
    Appukuttan, Allen
    Thomas, Emmanuel
    Nair, Harinandan R.
    Hemanth, S.
    Dhanaraj, K. J.
    Azeez, Maleeha Abdul
    2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,