Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

被引:1
|
作者
Tapiador-Morales, Ricardo [1 ]
Rios-Navarro, Antonio [1 ]
Linares-Barranco, Alejandro [1 ]
Kim, Minkyu [2 ]
Kadetotad, Deepak [2 ]
Seo, Jae-sun [2 ]
机构
[1] Univ Seville, Robot & Technol Comp Lab, Seville, Spain
[2] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ USA
关键词
Deep learning; Convolutional Neural Network; Hardware acceleration; OpenCL; FPGA; Caffe; Xilinx; Altera;
D O I
10.1007/978-3-319-59147-6_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.
引用
收藏
页码:271 / 282
页数:12
相关论文
共 50 条
  • [21] Introduction of an OpenCL-Based Model Transformation Engine
    Fekete, Tamas
    Mezei, Gergely
    SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS, STAF 2017, 2018, 10748 : 14 - 19
  • [22] Optimization Techniques for OpenCL-based Linear Algebra Routines
    Kozacik, Stephen
    Fox, Paul
    Humphrey, John
    Kuller, Aryeh
    Kelmelis, Eric
    Prather, Dennis W.
    MODELING AND SIMULATION FOR DEFENSE SYSTEMS AND APPLICATIONS IX, 2014, 9095
  • [23] A Framework for Generating High Throughput CNN Implementations on FPGAs
    Zeng, Hanqing
    Chen, Ren
    Zhang, Chi
    Prasanna, Viktor
    PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, : 117 - 126
  • [24] Efficient OpenCL-based concurrent tasks offloading on accelerators
    Lazaro-Munoz, A. J.
    Gonzalez-Linares, J. M.
    Gomez-Luna, J.
    Guil, N.
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 2353 - 2357
  • [25] In-FPGA Instrumentation Framework for OpenCL-Based Designs
    Bensalem, Hachem
    Blaquiere, Yves
    Savaria, Yvon
    IEEE ACCESS, 2020, 8 (08): : 212979 - 212994
  • [26] OpenCL-based acceleration of the FDTD method in computational electromagnetics
    Stefanski, Tomasz
    Benkler, Stefan
    Chavannes, Nicolas
    Kuster, Niels
    INTERNATIONAL JOURNAL OF NUMERICAL MODELLING-ELECTRONIC NETWORKS DEVICES AND FIELDS, 2013, 26 (04) : 355 - 365
  • [27] Optimization of Compiler-Generated OpenCL CNN Kernels and Runtime for FPGAs
    Chung, Seung-Hun
    Abdelrahman, Tarek S.
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 100 - 103
  • [28] Improved OpenCL-Based Implementation of Social Field Pedestrian Model
    Yu, Bin
    Zhu, Ke
    Wu, Kaiteng
    Zhang, Michael
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (07) : 2828 - 2839
  • [29] OpenCL-based design of an FPGA accelerator for quantum annealing simulation
    Hasitha Muthumala Waidyasooriya
    Masanori Hariyama
    Masamichi J. Miyama
    Masayuki Ohzeki
    The Journal of Supercomputing, 2019, 75 : 5019 - 5039
  • [30] On the Evaluation of Different High-Performance Computing Platforms for Hyperspectral Imaging: An OpenCL-Based Approach
    Guerra, Raul
    Martel, Ernestina
    Khan, Jehandad
    Lopez, Sebastian
    Athanas, Peter
    Sarmiento, Roberto
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (11) : 4879 - 4897