DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs

被引:21
|
作者
Wang, Dong [1 ]
Xu, Ke [2 ]
Guo, Jingning [2 ]
Ghiasi, Soheil [3 ]
机构
[1] Sch Comp & Informat Technol, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[3] Univ Calif Davis, Dept Elect & Comp Engn, Davis, CA 95616 USA
基金
北京市自然科学基金;
关键词
Accelerator architectures; application specific integrated circuits; artificial neural networks; neural network hardware; reconfigurable architectures; ALGORITHM; CNN;
D O I
10.1109/TCAD.2020.2968023
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field-programmable gate array (FPGA)-based accelerators for convolutional neural network (CNN) inference have received significant attention in recent years. The reported designs tend to adopt a similar underlying approach based on multiplier-accumulator (MAC) arrays, which yields strong demand for the available on-chip DSP blocks, while leaving FPGA logic and memory resources underutilized. The practical outcome is that the computational roof of the accelerator is bound by the number of DSP blocks offered by the target FPGA. In addition, integrating the CNN accelerator with other functional units that may also need DSP blocks would degrade the inference performance. Leveraging the robustness of inference accuracy to limited arithmetic precision, we propose a transformation to the convolution computation, which leads to transformation of the accelerator design space and relaxes the pressure on the required DSP resources. Through analytical and empirical evaluations, we demonstrate that our approach enables us to strike a favorable balance between utilization of the FPGA on-chip memory, logic, and DSP resources, due to which, our accelerator considerably outperforms state of the art. We report the effectiveness of our approach on a variety of FPGA devices, including Cyclone-V, Stratix-V, and Arria-10, which are used in large number of applications, ranging from embedded settings to high performance computing. Our proposed technique yields 1.5x throughput improvement and 4x DSP resource reduction compared to the best frequency domain convolution-based accelerator, and 2.5x boost in raw arithmetic performance and 8.4x saving in DSPs compared to a state-of-the-art sparse convolution-based accelerator.
引用
收藏
页码:4867 / 4880
页数:14
相关论文
共 50 条
  • [31] Optimizing Convolutional Neural Network on DSP
    Jagannathan, Shyam
    Mody, Mihir
    Mathew, Manu
    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
  • [32] Neural network pruning and hardware acceleration
    Jeong, Taehee
    Ghasemi, Ehsam
    Tuyls, Jorn
    Delaye, Elliott
    Sirasao, Ashish
    2020 IEEE/ACM 13TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC 2020), 2020, : 440 - 445
  • [33] An efficient loop tiling framework for convolutional neural network inference accelerators
    Huang, Hongmin
    Hu, Xianghong
    Li, Xueming
    Xiong, Xiaoming
    IET CIRCUITS DEVICES & SYSTEMS, 2022, 16 (01) : 116 - 123
  • [34] Double MAC on a DSP: Boosting the Performance of Convolutional Neural Networks on FPGAs
    Lee, Sugil
    Kim, Daewoo
    Dong Nguyen
    Lee, Jongeun
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (05) : 888 - 897
  • [35] Hardware-aware Partitioning of Convolutional Neural Network Inference for Embedded AI Applications
    Kress, Fabian
    Hoefer, Julian
    Hotfilter, Tim
    Walter, Iris
    Sidorenko, Vladimir
    Harbaum, Tanja
    Becker, Juergen
    18TH ANNUAL INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SENSOR SYSTEMS (DCOSS 2022), 2022, : 133 - 140
  • [36] Hardware implementation of a wavelet neural network using FPGAs
    Karabiyik, Ali
    Savran, Aydogan
    NEURAL INFORMATION PROCESSING, PT 3, PROCEEDINGS, 2006, 4234 : 1095 - 1104
  • [37] A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs
    Geng, Tong
    Wu, Chunshu
    Tan, Cheng
    Xie, Chenhao
    Guo, Anqi
    Haghi, Pouya
    He, Sarah Yuan
    Li, Jiajia
    Herbordt, Martin
    Li, Ang
    2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
  • [38] Depth Inference with Convolutional Neural Network
    Tian, Hu
    Zhuang, Bojin
    Hua, Yan
    Cai, Anni
    2014 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING CONFERENCE, 2014, : 169 - 172
  • [39] Towards a component-based acceleration of convolutional neural networks on FPGAs
    Kwadjo, Danielle Tchuinkou
    Tchinda, Erman Nghonda
    Mbongue, Joel Mandebi
    Bobda, Christophe
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 167 : 123 - 135
  • [40] A Configurable and Versatile Architecture for Low Power, Energy Efficient Hardware Acceleration of Convolutional Neural Networks
    Christensen, Steinar Thune
    Aunet, Snorre
    Qadir, Omer
    2019 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS) - NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC), 2019,