An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks

被引：5

作者：

Islam, Md Najrul ^{[1
]}

Shrestha, Rahul ^{[1
]}

Chowdhury, Shubhajit Roy ^{[1
]}

机构：

[1] Indian Inst Technol IIT Mandi, Sch Comp & Elect Engn, Mandi 175075, Himachal Prades, India

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2022年 / 30卷 / 12期

关键词：

Convolutional neural network (CNN); digital VLSI-architecture design; field-programmable gate array (FPGA); VGG-16 and GoogLeNet neural networks; VLSI; CNN;

D O I：

10.1109/TVLSI.2022.3210963

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article proposes an uninterrupted processing technique for the convolutional neural network (CNN) accelerator. It primarily allows the CNN accelerator to simultaneously perform both processing element (PE) operation and data fetching that reduces its latency and enhances the achievable throughput. Corresponding to the suggested technique, this work also presents a low latency VLSI-architecture of the CNN accelerator using the new random access line-buffer (RALB)-based design of PE array. Subsequently, the proposed CNN-accelerator architecture has been further optimized by reusing the local data in PE array, incurring better energy conservation. Our CNN accelerator has been hardware implemented on Zynq-UltraScale+ MPSoC-ZCU102 FPGA board, and it operates at a maximum clock frequency of 340 MHz, consuming 4.11 W of total power. In addition, the suggested CNN accelerator with 864 PEs delivers a peak throughput of 587.52 GOPs and an adequate energy efficiency of 142.95 GOPs/W. Comparison of aforementioned implementation results with the literature has shown that our CNN accelerator delivers 33.42% higher throughput and 6.24x better energy efficiency than the state-of-the-art work. Eventually, the field-programmable gate array (FPGA) prototype of the proposed CNN accelerator has been functionally validated using the real-world test setup for the detection of object from input image, using the GoogLeNet neural network.

引用

页码：1891 / 1901

页数：11

共 50 条

[21] An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs
Lu, Liqiang
Xie, Jiaming
Huang, Ruirui
Zhang, Jiansong
Lin, Wei
Liang, Yun
2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 17 - 25
[22] A High-Throughput and Energy-Efficient RRAM-based Convolutional Neural Network using Data Encoding and Dynamic Quantization
Chen, Xizi
Jiang, Jingbo
Zhu, Jingyang
Tsui, Chi-Ying
2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2018, : 123 - 128
[23] Domino: Graph Processing Services on Energy-efficient Hardware Accelerator
Xu, Chongchong
Wang, Chao
Gong, Lei
Jin, Lihui
Li, Xi
Zhou, Xuehai
2018 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (IEEE ICWS 2018), 2018, : 274 - 281
[24] Selective Pruning of Sparsity-Supported Energy-Efficient Accelerator for Convolutional Neural Networks
Liu, Chia-Chi
Zhang, Xuezhi
Wey, I-Chyn
Teo, T. Hui
2023 IEEE 16TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP, MCSOC, 2023, : 454 - 461
[25] An FPGA-Based YOLOv6 Accelerator for High-Throughput and Energy-Efficient Object Detection
Sha, Xingan
Yanagisawa, Masao
Shi, Youhua
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2025, E108A (03) : 473 - 481
[26] An Efficient Hardware Accelerator for Block Sparse Convolutional Neural Networks on FPGA
Yin, Xiaodi
Wu, Zhipeng
Li, Dejian
Shen, Chongfei
Liu, Yu
IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (02) : 158 - 161
[27] An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs
Zhu, Chaoyang
Huang, Kejie
Yang, Shuyuan
Zhu, Ziqi
Zhang, Hejia
Shen, Haibin
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (09) : 1953 - 1965
[28] Energy-Efficient High-Throughput Staircase Decoders
Fougstedt, Christoffer
Larsson-Edefors, Per
2018 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC), 2018,
[29] Energy-Efficient Convolutional Neural Networks with Deterministic Bit-Stream Processing
Faraji, S. Rasoul
Najafi, M. Hassan
Li, Bingzhe
Lilja, David J.
Bazargan, Kia
2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1757 - 1762
[30] BitBlade: Energy-Efficient Variable Bit-Precision Hardware Accelerator for Quantized Neural Networks
Ryu, Sungju
Kim, Hyungjun
Yi, Wooseok
Kim, Eunhwan
Kim, Yulhwa
Kim, Taesu
Kim, Jae-Joon
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2022, 57 (06) : 1924 - 1935

← 1 2 3 4 5 →