High-Performance CUDA Kernel Execution on FPGAs

被引：3

作者：

Papakonstantinou, Alexandros ^{[1
]}

Gururaj, Karthik

Stratton, John A. ^{[1
]}

Chen, Deming ^{[1
]}

Cong, Jason

Hwu, Wen-Mei W. ^{[1
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

来源：

ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING | 2009年

关键词：

High performance computing; high-level synthesis; coarse-grained parallelism; FPGA; GPU; CUDA programming model;

D O I：

10.1145/1542275.1542357

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.

引用

页码：515 / 516

页数：2

共 50 条

[31] High Performance Twitter Sentiment Analysis Using CUDA Based Distance Kernel on GPUs
Bozkurt, Ferhat
Coban, Onder
Gunay, Faruk Baturalp
Yucel Altay, Seyma
TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2019, 26 (05): : 1218 - 1227
[32] Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing
Podobas, Artur
Zohouri, Hamid Reza
Maruyama, Naoya
Matsuoka, Satoshi
2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
[33] Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing
Podobas, Artur
Zohouri, Hamid Reza
Maruyama, Naoya
Matsuoka, Satoshi
2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
[34] Automatic Generation of High-Performance Multipliers for FPGAs with Asymmetric Multiplier Blocks
Srinath, Shreesha
Compton, Katherine
FPGA 10, 2010, : 51 - 58
[35] High-performance reduction circuits using deeply pipelined operators on FPGAs
Zhuo, Ling
Morris, Gerald R.
Prasanna, Viktor K.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2007, 18 (10) : 1377 - 1392
[36] A High-Performance and Power-Efficient SIMD Convolution Engine for FPGAs
Spagnolo, Fanny
Frustaci, Fabio
Pettit, Stefania
Corsonello, Pasquale
2020 27TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2020,
[37] A Multi-Kernel Survey for High-Performance Computing
Gerofi, Balazs
Ishikawa, Yutaka
Riesen, Rolf
Wisniewski, Robert W.
Park, Yoonho
Rosenburg, Bryan
PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON RUNTIME AND OPERATING SYSTEMS FOR SUPERCOMPUTERS, (ROSS 2016), 2016,
[38] A high-performance scheduler for join queries execution over grid
1600, Acta Press (33):
[39] High-performance Cholesky factorization for GPU-only execution
Haidar, Azzam
Abdelfatah, Ahmad
Tomov, Stanimire
Dongarra, Jack
PROCEEDINGS OF THE GENERAL PURPOSE GPUS (GPGPU-10), 2017, : 42 - 52
[40] A High-Performance CUDA-Based Computing Platform for Industrial Control Systems
Cena, Gianluca
Cereia, Marco
Scanzio, Stefano
Valenzano, Adriano
Zunino, Claudio
2011 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2011,

← 1 2 3 4 5 →