High-Performance CUDA Kernel Execution on FPGAs

被引:3
|
作者
Papakonstantinou, Alexandros [1 ]
Gururaj, Karthik
Stratton, John A. [1 ]
Chen, Deming [1 ]
Cong, Jason
Hwu, Wen-Mei W. [1 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
关键词
High performance computing; high-level synthesis; coarse-grained parallelism; FPGA; GPU; CUDA programming model;
D O I
10.1145/1542275.1542357
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators - FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators.
引用
收藏
页码:515 / 516
页数:2
相关论文
共 50 条
  • [21] Mocarabe: High-Performance Time-Multiplexed Overlays for FPGAs
    Tombs, Frederick
    Mellat, Alireza
    Kapre, Nachiket
    2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, : 115 - 123
  • [22] High-Performance Mixed-Precision Linear Solver for FPGAs
    Sun, Junqing
    Peterson, Gregory D.
    Storaasli, Olaf O.
    IEEE TRANSACTIONS ON COMPUTERS, 2008, 57 (12) : 1614 - 1623
  • [23] HIGH-PERFORMANCE FLOATING-POINT IMPLEMENTATION USING FPGAS
    Parker, Michael
    MILCOM 2009 - 2009 IEEE MILITARY COMMUNICATIONS CONFERENCE, VOLS 1-4, 2009, : 323 - 327
  • [24] A Multicore Architecture for High-Performance Scientific Computing using FPGAs
    Cobos Carrascosa, J. P.
    Aparicio del Moral, B.
    Ramos, J. L.
    Lopez Jimenez, A. C.
    del Toro Iniesta, J. C.
    2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 223 - 228
  • [25] DtCraft: A High-Performance Distributed Execution Engine at Scale
    Huang, Tsung-Wei
    Lin, Chun-Xun
    Wong, Martin D. F.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (06) : 1070 - 1083
  • [26] Transactional execution: Toward reliable, high-performance multithreading
    Rajwar, R
    Goodman, J
    IEEE MICRO, 2003, 23 (06) : 117 - 125
  • [28] High-performance cone beam reconstruction using CUDA compatible GPUs
    Okitsu, Yusuke
    Ino, Fumihiko
    Hagihara, Kenichi
    PARALLEL COMPUTING, 2010, 36 (2-3) : 129 - 141
  • [29] LBcuda: A high-performance CUDA port of LBsoft for simulation of colloidal systems
    Bonaccorso, Fabio
    Lauricella, Marco
    Montessori, Andrea
    Amati, Giorgio
    Bernaschi, Massimo
    Spiga, Filippo
    Tiribocchi, Adriano
    Succi, Sauro
    COMPUTER PHYSICS COMMUNICATIONS, 2022, 277
  • [30] High-Performance High-Order Stencil Computation on FPGAs Using OpenCL
    Zohouri, Hamid Reza
    Podobas, Artur
    Matsuoka, Satoshi
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 123 - 130