DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs

被引：21

作者：

Wang, Dong ^{[1
]}

Xu, Ke ^{[2
]}

Guo, Jingning ^{[2
]}

Ghiasi, Soheil ^{[3
]}

机构：

[1] Sch Comp & Informat Technol, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China

[2] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China

[3] Univ Calif Davis, Dept Elect & Comp Engn, Davis, CA 95616 USA

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2020年 / 39卷 / 12期

基金：

北京市自然科学基金;

关键词：

Accelerator architectures; application specific integrated circuits; artificial neural networks; neural network hardware; reconfigurable architectures; ALGORITHM; CNN;

D O I：

10.1109/TCAD.2020.2968023

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Field-programmable gate array (FPGA)-based accelerators for convolutional neural network (CNN) inference have received significant attention in recent years. The reported designs tend to adopt a similar underlying approach based on multiplier-accumulator (MAC) arrays, which yields strong demand for the available on-chip DSP blocks, while leaving FPGA logic and memory resources underutilized. The practical outcome is that the computational roof of the accelerator is bound by the number of DSP blocks offered by the target FPGA. In addition, integrating the CNN accelerator with other functional units that may also need DSP blocks would degrade the inference performance. Leveraging the robustness of inference accuracy to limited arithmetic precision, we propose a transformation to the convolution computation, which leads to transformation of the accelerator design space and relaxes the pressure on the required DSP resources. Through analytical and empirical evaluations, we demonstrate that our approach enables us to strike a favorable balance between utilization of the FPGA on-chip memory, logic, and DSP resources, due to which, our accelerator considerably outperforms state of the art. We report the effectiveness of our approach on a variety of FPGA devices, including Cyclone-V, Stratix-V, and Arria-10, which are used in large number of applications, ranging from embedded settings to high performance computing. Our proposed technique yields 1.5x throughput improvement and 4x DSP resource reduction compared to the best frequency domain convolution-based accelerator, and 2.5x boost in raw arithmetic performance and 8.4x saving in DSPs compared to a state-of-the-art sparse convolution-based accelerator.

引用

页码：4867 / 4880

页数：14

共 50 条

[31] Optimizing Convolutional Neural Network on DSP
Jagannathan, Shyam
Mody, Mihir
Mathew, Manu
2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
[32] Neural network pruning and hardware acceleration
Jeong, Taehee
Ghasemi, Ehsam
Tuyls, Jorn
Delaye, Elliott
Sirasao, Ashish
2020 IEEE/ACM 13TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC 2020), 2020, : 440 - 445
[33] An efficient loop tiling framework for convolutional neural network inference accelerators
Huang, Hongmin
Hu, Xianghong
Li, Xueming
Xiong, Xiaoming
IET CIRCUITS DEVICES & SYSTEMS, 2022, 16 (01) : 116 - 123
[34] Double MAC on a DSP: Boosting the Performance of Convolutional Neural Networks on FPGAs
Lee, Sugil
Kim, Daewoo
Dong Nguyen
Lee, Jongeun
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (05) : 888 - 897
[35] Hardware-aware Partitioning of Convolutional Neural Network Inference for Embedded AI Applications
Kress, Fabian
Hoefer, Julian
Hotfilter, Tim
Walter, Iris
Sidorenko, Vladimir
Harbaum, Tanja
Becker, Juergen
18TH ANNUAL INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS 2022), 2022, : 133 - 140
[36] Hardware implementation of a wavelet neural network using FPGAs
Karabiyik, Ali
Savran, Aydogan
NEURAL INFORMATION PROCESSING, PT 3, PROCEEDINGS, 2006, 4234 : 1095 - 1104
[37] A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs
Geng, Tong
Wu, Chunshu
Tan, Cheng
Xie, Chenhao
Guo, Anqi
Haghi, Pouya
He, Sarah Yuan
Li, Jiajia
Herbordt, Martin
Li, Ang
2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
[38] Depth Inference with Convolutional Neural Network
Tian, Hu
Zhuang, Bojin
Hua, Yan
Cai, Anni
2014 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING CONFERENCE, 2014, : 169 - 172
[39] Towards a component-based acceleration of convolutional neural networks on FPGAs
Kwadjo, Danielle Tchuinkou
Tchinda, Erman Nghonda
Mbongue, Joel Mandebi
Bobda, Christophe
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 167 : 123 - 135
[40] A Configurable and Versatile Architecture for Low Power, Energy Efficient Hardware Acceleration of Convolutional Neural Networks
Christensen, Steinar Thune
Aunet, Snorre
Qadir, Omer
2019 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS) - NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC), 2019,

← 1 2 3 4 5 →