High-Performance CUDA Kernel Execution on FPGAs

被引:3
|
作者
Papakonstantinou, Alexandros [1 ]
Gururaj, Karthik
Stratton, John A. [1 ]
Chen, Deming [1 ]
Cong, Jason
Hwu, Wen-Mei W. [1 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
关键词
High performance computing; high-level synthesis; coarse-grained parallelism; FPGA; GPU; CUDA programming model;
D O I
10.1145/1542275.1542357
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.
引用
收藏
页码:515 / 516
页数:2
相关论文
共 50 条
  • [31] High Performance Twitter Sentiment Analysis Using CUDA Based Distance Kernel on GPUs
    Bozkurt, Ferhat
    Coban, Onder
    Gunay, Faruk Baturalp
    Yucel Altay, Seyma
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2019, 26 (05): : 1218 - 1227
  • [32] Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing
    Podobas, Artur
    Zohouri, Hamid Reza
    Maruyama, Naoya
    Matsuoka, Satoshi
    2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
  • [33] Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing
    Podobas, Artur
    Zohouri, Hamid Reza
    Maruyama, Naoya
    Matsuoka, Satoshi
    2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
  • [34] Automatic Generation of High-Performance Multipliers for FPGAs with Asymmetric Multiplier Blocks
    Srinath, Shreesha
    Compton, Katherine
    FPGA 10, 2010, : 51 - 58
  • [35] High-performance reduction circuits using deeply pipelined operators on FPGAs
    Zhuo, Ling
    Morris, Gerald R.
    Prasanna, Viktor K.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2007, 18 (10) : 1377 - 1392
  • [36] A High-Performance and Power-Efficient SIMD Convolution Engine for FPGAs
    Spagnolo, Fanny
    Frustaci, Fabio
    Pettit, Stefania
    Corsonello, Pasquale
    2020 27TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2020,
  • [37] A Multi-Kernel Survey for High-Performance Computing
    Gerofi, Balazs
    Ishikawa, Yutaka
    Riesen, Rolf
    Wisniewski, Robert W.
    Park, Yoonho
    Rosenburg, Bryan
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON RUNTIME AND OPERATING SYSTEMS FOR SUPERCOMPUTERS, (ROSS 2016), 2016,
  • [39] High-performance Cholesky factorization for GPU-only execution
    Haidar, Azzam
    Abdelfatah, Ahmad
    Tomov, Stanimire
    Dongarra, Jack
    PROCEEDINGS OF THE GENERAL PURPOSE GPUS (GPGPU-10), 2017, : 42 - 52
  • [40] A High-Performance CUDA-Based Computing Platform for Industrial Control Systems
    Cena, Gianluca
    Cereia, Marco
    Scanzio, Stefano
    Valenzano, Adriano
    Zunino, Claudio
    2011 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2011,