Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

被引:1
|
作者
Tapiador-Morales, Ricardo [1 ]
Rios-Navarro, Antonio [1 ]
Linares-Barranco, Alejandro [1 ]
Kim, Minkyu [2 ]
Kadetotad, Deepak [2 ]
Seo, Jae-sun [2 ]
机构
[1] Univ Seville, Robot & Technol Comp Lab, Seville, Spain
[2] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ USA
关键词
Deep learning; Convolutional Neural Network; Hardware acceleration; OpenCL; FPGA; Caffe; Xilinx; Altera;
D O I
10.1007/978-3-319-59147-6_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.
引用
收藏
页码:271 / 282
页数:12
相关论文
共 50 条
  • [41] OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework
    Koo, Yongbon
    Kim, Sunghoon
    Ha, Young-guk
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2021, 24 (04): : 1299 - 1319
  • [42] OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework
    Yongbon Koo
    Sunghoon Kim
    Young-guk Ha
    World Wide Web, 2021, 24 : 1299 - 1319
  • [43] OpenCL-Based Design Pattern for Line Rate Packet Processing
    Khan, Jehandad
    Athanas, Peter
    Booth, Skip
    Marshall, John
    2017 IEEE 28TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2017, : 190 - 194
  • [44] High-Level Manipulation of OpenCL-Based Subvectors and Submatrices
    Rupp, Karl
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 1857 - 1866
  • [45] A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model
    Wang, Shuo
    Liang, Yun
    PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [46] OpenCL-based Virtual Prototyping and Simulation of Many-Accelerator Architectures
    Sotiriou-Xanthopoulos, Efstathios
    Masing, Leonard
    Xydis, Sotirios
    Siozios, Kostas
    Becker, Juergen
    Soudris, Dimitrios
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (05)
  • [47] OpenCL-based Remote Offloading Framework for Trusted Mobile Cloud Computing
    Eom, Heungsik
    St Juste, Pierre
    Figueiredo, Renato
    Tickoo, Omesh
    Illikkal, Ramesh
    Iyer, Ravishankar
    2013 19TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2013), 2013, : 240 - 248
  • [48] Optimization of an OpenCL-Based Multi-swarm PSO Algorithm on an APU
    Franz, Wayne
    Thulasiraman, Parimala
    Thulasiram, Ruppa K.
    PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2013), PT II, 2014, 8385 : 140 - 150
  • [49] Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network
    Zhang, Jialiang
    Li, Jing
    FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 25 - 34
  • [50] Improving the Performance of Whale Optimization Algorithm through OpenCL-Based FPGA Accelerator
    Jiang, Qiangqiang
    Guo, Yuanjun
    Yang, Zhile
    Wang, Zheng
    Yang, Dongsheng
    Zhou, Xianyu
    COMPLEXITY, 2020, 2020