High-Performance CUDA Kernel Execution on FPGAs

被引：3

作者：

Papakonstantinou, Alexandros ^{[1
]}

Gururaj, Karthik

Stratton, John A. ^{[1
]}

Chen, Deming ^{[1
]}

Cong, Jason

Hwu, Wen-Mei W. ^{[1
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

来源：

ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING | 2009年

关键词：

High performance computing; high-level synthesis; coarse-grained parallelism; FPGA; GPU; CUDA programming model;

D O I：

10.1145/1542275.1542357

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.

引用

页码：515 / 516

页数：2

共 50 条

[21] Mocarabe: High-Performance Time-Multiplexed Overlays for FPGAs
Tombs, Frederick
Mellat, Alireza
Kapre, Nachiket
2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, : 115 - 123
[22] High-Performance Mixed-Precision Linear Solver for FPGAs
Sun, Junqing
Peterson, Gregory D.
Storaasli, Olaf O.
IEEE TRANSACTIONS ON COMPUTERS, 2008, 57 (12) : 1614 - 1623
[23] HIGH-PERFORMANCE FLOATING-POINT IMPLEMENTATION USING FPGAS
Parker, Michael
MILCOM 2009 - 2009 IEEE MILITARY COMMUNICATIONS CONFERENCE, VOLS 1-4, 2009, : 323 - 327
[24] A Multicore Architecture for High-Performance Scientific Computing using FPGAs
Cobos Carrascosa, J. P.
Aparicio del Moral, B.
Ramos, J. L.
Lopez Jimenez, A. C.
del Toro Iniesta, J. C.
2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 223 - 228
[25] DtCraft: A High-Performance Distributed Execution Engine at Scale
Huang, Tsung-Wei
Lin, Chun-Xun
Wong, Martin D. F.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (06) : 1070 - 1083
[26] Transactional execution: Toward reliable, high-performance multithreading
Rajwar, R
Goodman, J
IEEE MICRO, 2003, 23 (06) : 117 - 125
[27] High-performance watershed delineation algorithm for GPU using CUDA and OpenMP
Kotyra, Bartlomiej
ENVIRONMENTAL MODELLING & SOFTWARE, 2023, 160
[28] High-performance cone beam reconstruction using CUDA compatible GPUs
Okitsu, Yusuke
Ino, Fumihiko
Hagihara, Kenichi
PARALLEL COMPUTING, 2010, 36 (2-3) : 129 - 141
[29] LBcuda: A high-performance CUDA port of LBsoft for simulation of colloidal systems
Bonaccorso, Fabio
Lauricella, Marco
Montessori, Andrea
Amati, Giorgio
Bernaschi, Massimo
Spiga, Filippo
Tiribocchi, Adriano
Succi, Sauro
COMPUTER PHYSICS COMMUNICATIONS, 2022, 277
[30] High-Performance High-Order Stencil Computation on FPGAs Using OpenCL
Zohouri, Hamid Reza
Podobas, Artur
Matsuoka, Satoshi
2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 123 - 130

← 1 2 3 4 5 →