High-Performance CUDA Kernel Execution on FPGAs

被引：3

作者：

Papakonstantinou, Alexandros ^{[1
]}

Gururaj, Karthik

Stratton, John A. ^{[1
]}

Chen, Deming ^{[1
]}

Cong, Jason

Hwu, Wen-Mei W. ^{[1
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

来源：

ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING | 2009年

关键词：

High performance computing; high-level synthesis; coarse-grained parallelism; FPGA; GPU; CUDA programming model;

D O I：

10.1145/1542275.1542357

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.

引用

页码：515 / 516

页数：2

共 50 条

[1] Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs
Papakonstantinou, Alexandros
Gururaj, Karthik
Stratton, John A.
Chen, Deming
Cong, Jason
Hwu, Wen-Mei W.
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 13 (02)
[2] On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering
Wende, Florian
Cordes, Frank
Steinke, Thomas
2012 SYMPOSIUM ON APPLICATION ACCELERATORS IN HIGH PERFORMANCE COMPUTING (SAAHPC), 2012, : 74 - 83
[3] High-Performance QR Decomposition for FPGAs
Langhammer, Martin
Pasca, Bogdan
PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, : 183 - 188
[4] High-performance symmetric block ciphers on CUDA
Dept. of Computer Science, National Defense Academy of Japan, Kanagawa, Japan
Proc. - Int. Conf. Networking Comput., ICNC, (221-227):
[5] Predicting execution time of CUDA kernel using static analysis
Alavani, Gargi
Varma, Kajal
Sarkar, Santonu
2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 948 - 955
[6] High-performance and parameterized matrix factorization on FPGAS
Zhuo, Ling
Prasanna, Vtktor K.
2006 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, 2006, : 363 - 368
[7] Integrating FPGAs in High-Performance Computing: Introduction
Chow, Paul
Hutton, Mike
FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 131 - 131
[8] Optimizations for High-Performance IPsec Execution
Iatrou, Michael G.
Voyiatzis, Artemios G.
Serpanos, Dimitrios N.
E-BUSINESS AND TELECOMMUNICATIONS, 2011, 130 : 199 - 211
[9] Co-designing Trusted Execution Environment and Model Encryption for Secure High-Performance DNN Inference on FPGAs
Nakai, Tsunato
Yamamoto, Ryo
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[10] A KERNEL FOR HIGH-PERFORMANCE MULTICAST COMMUNICATIONS
GAIT, J
IEEE TRANSACTIONS ON COMPUTERS, 1989, 38 (02) : 218 - 226

← 1 2 3 4 5 →