DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs

被引:21
|
作者
Wang, Dong [1 ]
Xu, Ke [2 ]
Guo, Jingning [2 ]
Ghiasi, Soheil [3 ]
机构
[1] Sch Comp & Informat Technol, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[3] Univ Calif Davis, Dept Elect & Comp Engn, Davis, CA 95616 USA
基金
北京市自然科学基金;
关键词
Accelerator architectures; application specific integrated circuits; artificial neural networks; neural network hardware; reconfigurable architectures; ALGORITHM; CNN;
D O I
10.1109/TCAD.2020.2968023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field-programmable gate array (FPGA)-based accelerators for convolutional neural network (CNN) inference have received significant attention in recent years. The reported designs tend to adopt a similar underlying approach based on multiplier-accumulator (MAC) arrays, which yields strong demand for the available on-chip DSP blocks, while leaving FPGA logic and memory resources underutilized. The practical outcome is that the computational roof of the accelerator is bound by the number of DSP blocks offered by the target FPGA. In addition, integrating the CNN accelerator with other functional units that may also need DSP blocks would degrade the inference performance. Leveraging the robustness of inference accuracy to limited arithmetic precision, we propose a transformation to the convolution computation, which leads to transformation of the accelerator design space and relaxes the pressure on the required DSP resources. Through analytical and empirical evaluations, we demonstrate that our approach enables us to strike a favorable balance between utilization of the FPGA on-chip memory, logic, and DSP resources, due to which, our accelerator considerably outperforms state of the art. We report the effectiveness of our approach on a variety of FPGA devices, including Cyclone-V, Stratix-V, and Arria-10, which are used in large number of applications, ranging from embedded settings to high performance computing. Our proposed technique yields 1.5x throughput improvement and 4x DSP resource reduction compared to the best frequency domain convolution-based accelerator, and 2.5x boost in raw arithmetic performance and 8.4x saving in DSPs compared to a state-of-the-art sparse convolution-based accelerator.
引用
收藏
页码:4867 / 4880
页数:14
相关论文
共 50 条
  • [21] SAF-CNN:A Sparse Acceleration Framework of Convolutional Neural Network for Embedded FPGAs
    Xie K.
    Yi D.
    Liu Y.
    Liu H.
    He X.
    Gong C.
    Lu Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (05): : 1053 - 1072
  • [22] VWA: Hardware Efficient Vectorwise Accelerator for Convolutional Neural Network
    Chang, Kuo-Wei
    Chang, Tian-Sheuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (01) : 145 - 154
  • [23] Compressing Sparse Ternary Weight Convolutional Neural Networks for Efficient Hardware Acceleration
    Wi, Hyeonwook
    Kim, Hyeonuk
    Choi, Seungkyu
    Kim, Lee-Sup
    2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED), 2019,
  • [24] Optimizing Neural Network Inference in Edge Robotics by Harnessing FPGA Hardware Acceleration
    Rao, Kolli Himantha
    Jagan, S.
    Pandian, Vinoth
    Suganthi, R.
    Senthil Rama, R.
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (05) : 1935 - 1943
  • [25] Convolutional neural network for bio-medical image segmentation with hardware acceleration
    Vardhana, M.
    Arunkumar, N.
    Lasrado, Sunitha
    Abdulhay, Enas
    Ramirez-Gonzalez, Gustavo
    COGNITIVE SYSTEMS RESEARCH, 2018, 50 : 10 - 14
  • [26] Neural network implementation in hardware using FPGAs
    Sahin, Suhap
    Becerikli, Yasar
    Yazici, Suleyman
    NEURAL INFORMATION PROCESSING, PT 3, PROCEEDINGS, 2006, 4234 : 1105 - 1112
  • [27] Research on quantitative inference acceleration technology of Convolutional Neural Network for ARM Platform
    Wang, Xuqiang
    Zhang, Qianyi
    Yang, Yifan
    Zong, Xiangrui
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 208 - 211
  • [28] Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence
    Liang, Yong
    Tan, Junwen
    Xie, Zhisong
    Chen, Zetao
    Lin, Daoqian
    Yang, Zhenhao
    SENSORS, 2024, 24 (01)
  • [29] VersaTile Convolutional Neural Network Mapping on FPGAs
    Munio-Gracia, A.
    Fernandez-Berni, J.
    Carmona-Galan, R.
    Rodriguez-Vazquez, A.
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [30] VCONV: A Convolutional Neural Network Accelerator for FPGAs
    Neelam, Srikanth
    Prince, A. Amalin
    ELECTRONICS, 2025, 14 (04):