A power efficiency enhancements of a multi-bit accelerator for memory prohibitive deep neural networks

被引:6
|
作者
Shivapakash S. [1 ]
Jain H. [2 ]
Hellwich O. [2 ]
Gerfers F. [1 ]
机构
[1] Department of Computer Engineering and Microelectronics, Chair of Mixed Signal Circuit Design, Technical University of Berlin, Berlin
[2] Department of Computer Engineering and Microelectronics, Computer Vision and Remote Sensing, Technical University of Berlin, Berlin
关键词
AlexNet; ASIC; Deep neural network; EfficientNet; FPGA; MobileNet; multi-bit accelerator; SqueezeNet; truncation;
D O I
10.1109/OJCAS.2020.3047225
中图分类号
学科分类号
摘要
Convolutional Neural Networks (CNN) are widely employed in the contemporary artificial intelligence systems. However these models have millions of connections between the layers, that are both memory prohibitive and computationally expensive. Employing these models on an embedded mobile application is resource limited with high power consumption and significant bandwidth requirement to access the data from the off-chip DRAM. Reducing the data movement between the on-chip and off-chip DRAM is the main criteria to achieve high throughput and overall better energy efficiency. Our proposed multi-bit accelerator achieves these goals by employing the truncation of the partial sum (Psum) results of the preceding layer before feeding it into the next layer. We exhibit the architecture by inferencing 32-bits for the first convolution layers and sequentially truncate the bits on the MSB/LSB of integer and fractional part without any further training on the original network. At the last fully connected layer, the top-1 accuracy is maintained with the reduced bit width of 14 and top-5 accuracy upto 10-bit width. The computation engine consists of an systolic array of 1024 processing elements (PE). Large CNNs such as AlexNet, MobileNet, SqueezeNet and EfficientNet were used as benchmark CNN model and Virtex Ultrascale FPGA was used to test the architecture. The proposed truncation scheme has 49% power reduction and resource utilization was reduced by 73.25% for LUTs (Look-up tables), 68.76% for FFs (Flip-Flops), 74.60% for BRAMs (Block RAMs) and 79.425% for Digital Signal Processors (DSPs) when compared with the 32 bits architecture. The design achieves a performance of 223.69 GOPS on a Virtex Ultrascale FPGA, the design has a overall gain of 3.63 × throughput when compared to other prior FPGA accelerators. In addition, the overall power consumption is 4.5 × lower when compared to other prior architectures. The ASIC version of the accelerator was designed in 22nm FDSOI CMOS process to achieve a overall throughput of 2.03 TOPS/W with a total power consumption of 791 mW and with a area of 1 mm ×, 1.2 mm. © 2020 IEEE.
引用
收藏
页码:156 / 169
页数:13
相关论文
共 50 条
  • [1] A Power Efficient Multi-Bit Accelerator for Memory Prohibitive Deep Neural Networks
    Shivapakash, Suhas
    Jain, Hardik
    Hellwich, Olaf
    Gerfers, Friedel
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [2] Adversarial Robustness of Multi-bit Convolutional Neural Networks
    Frickenstein, Lukas
    Sampath, Shambhavi Balamuthu
    Mori, Pierpaolo
    Vemparala, Manoj-Rohit
    Fasfous, Nael
    Frickenstein, Alexander
    Unger, Christian
    Passerone, Claudio
    Stechele, Walter
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 3, INTELLISYS 2023, 2024, 824 : 157 - 174
  • [3] A High Efficiency Accelerator for Deep Neural Networks
    Zaidy, Aliasger
    Chang, Andre Xian Ming
    Gokhale, Vinayak
    Culurciello, Eugenio
    2018 1ST WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING FOR EMBEDDED APPLICATIONS (EMC2), 2018, : 9 - 13
  • [4] Multi-bit organic ferroelectric memory
    Khikhlovskyi, Vsevolod
    Gorbunov, Andrey V.
    van Breemen, Albert J. J. M.
    Janssen, Rene A. J.
    Gelinck, Gerwin H.
    Kemerink, Martijn
    ORGANIC ELECTRONICS, 2013, 14 (12) : 3399 - 3405
  • [5] Storage Reliability of Multi-bit Flash Oriented to Deep Neural Network
    Xiang, Y. C.
    Huang, P.
    Yang, H. Z.
    Wang, K. L.
    Han, R. Z.
    Shen, W. S.
    Feng, Y. L.
    Liu, C.
    Liu, X. Y.
    Kang, J. F.
    2019 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2019,
  • [6] Universal BlackMarks: Key-Image-Free Blackbox Multi-Bit Watermarking of Deep Neural Networks
    Li, Li
    Zhang, Weiming
    Barni, Mauro
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 36 - 40
  • [7] PTMQ: Post-training Multi-Bit Quantization of Neural Networks
    Xu, Ke
    Li, Zhongcheng
    Wang, Shanshan
    Zhang, Xingyi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16193 - 16201
  • [8] On-Chip Memory Optimization of High Efficiency Accelerator for Deep Convolutional Neural Networks
    Lai, Tzu-Yi
    Chen, Kuan-Hung
    2018 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2018, : 82 - 83
  • [9] Design of Asynchronous Multi-Bit OTP Memory
    Choi, Chul-Ho
    Lee, Jae-Hyung
    Kim, Tae-Hoon
    Shim, Oe-Yong
    Hwang, Yoon-Geum
    Ahn, Kwang-Seon
    Ha, Pan-Bong
    Kim, Young-Hee
    IEICE TRANSACTIONS ON ELECTRONICS, 2009, E92C (01) : 173 - 177
  • [10] A Memristor as Multi-Bit Memory: Feasibility Analysis
    Bass, Ori
    Fish, Alexander
    Naveh, Doron
    RADIOENGINEERING, 2015, 24 (02) : 425 - 430