DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs

被引:21
|
作者
Wang, Dong [1 ]
Xu, Ke [2 ]
Guo, Jingning [2 ]
Ghiasi, Soheil [3 ]
机构
[1] Sch Comp & Informat Technol, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[3] Univ Calif Davis, Dept Elect & Comp Engn, Davis, CA 95616 USA
基金
北京市自然科学基金;
关键词
Accelerator architectures; application specific integrated circuits; artificial neural networks; neural network hardware; reconfigurable architectures; ALGORITHM; CNN;
D O I
10.1109/TCAD.2020.2968023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field-programmable gate array (FPGA)-based accelerators for convolutional neural network (CNN) inference have received significant attention in recent years. The reported designs tend to adopt a similar underlying approach based on multiplier-accumulator (MAC) arrays, which yields strong demand for the available on-chip DSP blocks, while leaving FPGA logic and memory resources underutilized. The practical outcome is that the computational roof of the accelerator is bound by the number of DSP blocks offered by the target FPGA. In addition, integrating the CNN accelerator with other functional units that may also need DSP blocks would degrade the inference performance. Leveraging the robustness of inference accuracy to limited arithmetic precision, we propose a transformation to the convolution computation, which leads to transformation of the accelerator design space and relaxes the pressure on the required DSP resources. Through analytical and empirical evaluations, we demonstrate that our approach enables us to strike a favorable balance between utilization of the FPGA on-chip memory, logic, and DSP resources, due to which, our accelerator considerably outperforms state of the art. We report the effectiveness of our approach on a variety of FPGA devices, including Cyclone-V, Stratix-V, and Arria-10, which are used in large number of applications, ranging from embedded settings to high performance computing. Our proposed technique yields 1.5x throughput improvement and 4x DSP resource reduction compared to the best frequency domain convolution-based accelerator, and 2.5x boost in raw arithmetic performance and 8.4x saving in DSPs compared to a state-of-the-art sparse convolution-based accelerator.
引用
收藏
页码:4867 / 4880
页数:14
相关论文
共 50 条
  • [41] Efficient binary 3D convolutional neural network and hardware accelerator
    Li, Guoqing
    Zhang, Meng
    Zhang, Qianru
    Lin, Zhijian
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2022, 19 (01) : 61 - 71
  • [42] Efficient binary 3D convolutional neural network and hardware accelerator
    Guoqing Li
    Meng Zhang
    Qianru Zhang
    Zhijian Lin
    Journal of Real-Time Image Processing, 2022, 19 : 61 - 71
  • [43] Energy Efficient Fixed-point Inference System of Convolutional Neural Network
    Lo, Chun Yan
    Sham, Chiu-Wing
    2020 IEEE 63RD INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2020, : 403 - 406
  • [44] DQI: A Dynamic Quantization Method for Efficient Convolutional Neural Network Inference Accelerators
    Wang, Yun
    Liu, Qiang
    Yan, Shun
    2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 231 - 231
  • [45] SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference
    Zhang, Jie-Fang
    Lee, Ching-En
    Liu, Chester
    Shao, Yakun Sophia
    Keckler, Stephen W.
    Zhang, Zhengya
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2021, 56 (02) : 636 - 647
  • [46] Design of Convolutional Neural Networks Hardware Acceleration Based on FPGA
    Qin Huabiao
    Cao Qinping
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2019, 41 (11) : 2599 - 2605
  • [47] Design of Convolutional Neural Networks Hardware Acceleration Based on FPGA
    Qin H.
    Cao Q.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2019, 41 (11): : 2599 - 2605
  • [48] Hardware Acceleration Design of Convolutional Neural Networks Based on FPGA
    Zhang, Guoning
    Hu, Jing
    Li, Laiquan
    Jiang, Haoyang
    2024 9TH INTERNATIONAL CONFERENCE ON ELECTRONIC TECHNOLOGY AND INFORMATION SCIENCE, ICETIS 2024, 2024, : 11 - 15
  • [49] WPU: A FPGA-based Scalable, Efficient and Software/Hardware Co-design Deep Neural Network Inference Acceleration Processor
    Xie, Xie
    Wu, Chang
    2021 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE BIG DATA AND INTELLIGENT SYSTEMS (HPBD&IS), 2021, : 1 - 5
  • [50] A Survey of Algorithmic and Hardware Optimization Techniques for Vision Convolutional Neural Networks on FPGAs
    Arish Sateesan
    Sharad Sinha
    Smitha K. G.
    A. P. Vinod
    Neural Processing Letters, 2021, 53 : 2331 - 2377