High-Performance CUDA Kernel Execution on FPGAs

被引:3
|
作者
Papakonstantinou, Alexandros [1 ]
Gururaj, Karthik
Stratton, John A. [1 ]
Chen, Deming [1 ]
Cong, Jason
Hwu, Wen-Mei W. [1 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
关键词
High performance computing; high-level synthesis; coarse-grained parallelism; FPGA; GPU; CUDA programming model;
D O I
10.1145/1542275.1542357
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.
引用
收藏
页码:515 / 516
页数:2
相关论文
共 50 条
  • [1] Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs
    Papakonstantinou, Alexandros
    Gururaj, Karthik
    Stratton, John A.
    Chen, Deming
    Cong, Jason
    Hwu, Wen-Mei W.
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 13 (02)
  • [2] On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering
    Wende, Florian
    Cordes, Frank
    Steinke, Thomas
    2012 SYMPOSIUM ON APPLICATION ACCELERATORS IN HIGH PERFORMANCE COMPUTING (SAAHPC), 2012, : 74 - 83
  • [3] High-Performance QR Decomposition for FPGAs
    Langhammer, Martin
    Pasca, Bogdan
    PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, : 183 - 188
  • [4] High-performance symmetric block ciphers on CUDA
    Dept. of Computer Science, National Defense Academy of Japan, Kanagawa, Japan
    Proc. - Int. Conf. Networking Comput., ICNC, (221-227):
  • [5] Predicting execution time of CUDA kernel using static analysis
    Alavani, Gargi
    Varma, Kajal
    Sarkar, Santonu
    2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 948 - 955
  • [6] High-performance and parameterized matrix factorization on FPGAS
    Zhuo, Ling
    Prasanna, Vtktor K.
    2006 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, 2006, : 363 - 368
  • [7] Integrating FPGAs in High-Performance Computing: Introduction
    Chow, Paul
    Hutton, Mike
    FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 131 - 131
  • [8] Optimizations for High-Performance IPsec Execution
    Iatrou, Michael G.
    Voyiatzis, Artemios G.
    Serpanos, Dimitrios N.
    E-BUSINESS AND TELECOMMUNICATIONS, 2011, 130 : 199 - 211
  • [9] Co-designing Trusted Execution Environment and Model Encryption for Secure High-Performance DNN Inference on FPGAs
    Nakai, Tsunato
    Yamamoto, Ryo
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [10] A KERNEL FOR HIGH-PERFORMANCE MULTICAST COMMUNICATIONS
    GAIT, J
    IEEE TRANSACTIONS ON COMPUTERS, 1989, 38 (02) : 218 - 226