DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs

被引:21
|
作者
Wang, Dong [1 ]
Xu, Ke [2 ]
Guo, Jingning [2 ]
Ghiasi, Soheil [3 ]
机构
[1] Sch Comp & Informat Technol, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[3] Univ Calif Davis, Dept Elect & Comp Engn, Davis, CA 95616 USA
基金
北京市自然科学基金;
关键词
Accelerator architectures; application specific integrated circuits; artificial neural networks; neural network hardware; reconfigurable architectures; ALGORITHM; CNN;
D O I
10.1109/TCAD.2020.2968023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field-programmable gate array (FPGA)-based accelerators for convolutional neural network (CNN) inference have received significant attention in recent years. The reported designs tend to adopt a similar underlying approach based on multiplier-accumulator (MAC) arrays, which yields strong demand for the available on-chip DSP blocks, while leaving FPGA logic and memory resources underutilized. The practical outcome is that the computational roof of the accelerator is bound by the number of DSP blocks offered by the target FPGA. In addition, integrating the CNN accelerator with other functional units that may also need DSP blocks would degrade the inference performance. Leveraging the robustness of inference accuracy to limited arithmetic precision, we propose a transformation to the convolution computation, which leads to transformation of the accelerator design space and relaxes the pressure on the required DSP resources. Through analytical and empirical evaluations, we demonstrate that our approach enables us to strike a favorable balance between utilization of the FPGA on-chip memory, logic, and DSP resources, due to which, our accelerator considerably outperforms state of the art. We report the effectiveness of our approach on a variety of FPGA devices, including Cyclone-V, Stratix-V, and Arria-10, which are used in large number of applications, ranging from embedded settings to high performance computing. Our proposed technique yields 1.5x throughput improvement and 4x DSP resource reduction compared to the best frequency domain convolution-based accelerator, and 2.5x boost in raw arithmetic performance and 8.4x saving in DSPs compared to a state-of-the-art sparse convolution-based accelerator.
引用
收藏
页码:4867 / 4880
页数:14
相关论文
共 50 条
  • [1] Efficient Hardware Acceleration of Convolutional Neural Networks
    Kala, S.
    Jose, Babita R.
    Mathew, Jimson
    Nalesh, S.
    32ND IEEE INTERNATIONAL SYSTEM ON CHIP CONFERENCE (IEEE SOCC 2019), 2019, : 191 - 192
  • [2] An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs
    Lu, Liqiang
    Xie, Jiaming
    Huang, Ruirui
    Zhang, Jiansong
    Lin, Wei
    Liang, Yun
    2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 17 - 25
  • [3] A Survey on Efficient Convolutional Neural Networks and Hardware Acceleration
    Ghimire, Deepak
    Kil, Dayoung
    Kim, Seong-heum
    ELECTRONICS, 2022, 11 (06)
  • [4] WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs
    Liu, Xinheng
    Chen, Yao
    Hao, Cong
    Dhar, Ashutosh
    Chen, Deming
    2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 258 - 265
  • [5] An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs
    Zhu, Chaoyang
    Huang, Kejie
    Yang, Shuyuan
    Zhu, Ziqi
    Zhang, Hejia
    Shen, Haibin
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (09) : 1953 - 1965
  • [6] FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs
    Tasci, Mustafa
    Istanbullu, Ayhan
    Tumen, Vedat
    Kosunalp, Selahattin
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [7] An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs
    Zhu, Jiang
    Wang, Lizan
    Liu, Haolin
    Tian, Shujuan
    Deng, Qingyong
    Li, Jianqi
    IEEE ACCESS, 2020, 8 : 83224 - 83237
  • [8] Efficient Hardware Architectures for Deep Convolutional Neural Network
    Wang, Jichen
    Lin, Jun
    Wang, Zhongfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (06) : 1941 - 1953
  • [9] Data and Hardware Efficient Design for Convolutional Neural Network
    Lin, Yue-Jin
    Chang, Tian Sheuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (05) : 1642 - 1651
  • [10] Evaluating Low-Memory GEMMs for Convolutional Neural Network Inference on FPGAs
    Zhang, Wentai
    Jiang, Ming
    Luo, Guojie
    28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, : 28 - 32