An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks

被引:5
|
作者
Islam, Md Najrul [1 ]
Shrestha, Rahul [1 ]
Chowdhury, Shubhajit Roy [1 ]
机构
[1] Indian Inst Technol IIT Mandi, Sch Comp & Elect Engn, Mandi 175075, Himachal Prades, India
关键词
Convolutional neural network (CNN); digital VLSI-architecture design; field-programmable gate array (FPGA); VGG-16 and GoogLeNet neural networks; VLSI; CNN;
D O I
10.1109/TVLSI.2022.3210963
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This article proposes an uninterrupted processing technique for the convolutional neural network (CNN) accelerator. It primarily allows the CNN accelerator to simultaneously perform both processing element (PE) operation and data fetching that reduces its latency and enhances the achievable throughput. Corresponding to the suggested technique, this work also presents a low latency VLSI-architecture of the CNN accelerator using the new random access line-buffer (RALB)-based design of PE array. Subsequently, the proposed CNN-accelerator architecture has been further optimized by reusing the local data in PE array, incurring better energy conservation. Our CNN accelerator has been hardware implemented on Zynq-UltraScale+ MPSoC-ZCU102 FPGA board, and it operates at a maximum clock frequency of 340 MHz, consuming 4.11 W of total power. In addition, the suggested CNN accelerator with 864 PEs delivers a peak throughput of 587.52 GOPs and an adequate energy efficiency of 142.95 GOPs/W. Comparison of aforementioned implementation results with the literature has shown that our CNN accelerator delivers 33.42% higher throughput and 6.24x better energy efficiency than the state-of-the-art work. Eventually, the field-programmable gate array (FPGA) prototype of the proposed CNN accelerator has been functionally validated using the real-world test setup for the detection of object from input image, using the GoogLeNet neural network.
引用
收藏
页码:1891 / 1901
页数:11
相关论文
共 50 条
  • [21] An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs
    Lu, Liqiang
    Xie, Jiaming
    Huang, Ruirui
    Zhang, Jiansong
    Lin, Wei
    Liang, Yun
    2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 17 - 25
  • [22] A High-Throughput and Energy-Efficient RRAM-based Convolutional Neural Network using Data Encoding and Dynamic Quantization
    Chen, Xizi
    Jiang, Jingbo
    Zhu, Jingyang
    Tsui, Chi-Ying
    2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2018, : 123 - 128
  • [23] Domino: Graph Processing Services on Energy-efficient Hardware Accelerator
    Xu, Chongchong
    Wang, Chao
    Gong, Lei
    Jin, Lihui
    Li, Xi
    Zhou, Xuehai
    2018 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (IEEE ICWS 2018), 2018, : 274 - 281
  • [24] Selective Pruning of Sparsity-Supported Energy-Efficient Accelerator for Convolutional Neural Networks
    Liu, Chia-Chi
    Zhang, Xuezhi
    Wey, I-Chyn
    Teo, T. Hui
    2023 IEEE 16TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP, MCSOC, 2023, : 454 - 461
  • [25] An FPGA-Based YOLOv6 Accelerator for High-Throughput and Energy-Efficient Object Detection
    Sha, Xingan
    Yanagisawa, Masao
    Shi, Youhua
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2025, E108A (03) : 473 - 481
  • [26] An Efficient Hardware Accelerator for Block Sparse Convolutional Neural Networks on FPGA
    Yin, Xiaodi
    Wu, Zhipeng
    Li, Dejian
    Shen, Chongfei
    Liu, Yu
    IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (02) : 158 - 161
  • [27] An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs
    Zhu, Chaoyang
    Huang, Kejie
    Yang, Shuyuan
    Zhu, Ziqi
    Zhang, Hejia
    Shen, Haibin
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (09) : 1953 - 1965
  • [28] Energy-Efficient High-Throughput Staircase Decoders
    Fougstedt, Christoffer
    Larsson-Edefors, Per
    2018 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC), 2018,
  • [29] Energy-Efficient Convolutional Neural Networks with Deterministic Bit-Stream Processing
    Faraji, S. Rasoul
    Najafi, M. Hassan
    Li, Bingzhe
    Lilja, David J.
    Bazargan, Kia
    2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1757 - 1762
  • [30] BitBlade: Energy-Efficient Variable Bit-Precision Hardware Accelerator for Quantized Neural Networks
    Ryu, Sungju
    Kim, Hyungjun
    Yi, Wooseok
    Kim, Eunhwan
    Kim, Yulhwa
    Kim, Taesu
    Kim, Jae-Joon
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2022, 57 (06) : 1924 - 1935