Compiler and runtime support for enabling reduction computations on heterogeneous systems

被引:3
|
作者
Ravi, Vignesh T. [1 ]
Ma, Wenjing [1 ]
Chiu, David [1 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
关键词
D O I
10.1002/cpe.1848
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Presently, it has become very common for a desktop or a notebook computer to come equipped with both a multi-core CPU and a graphics processing unit (GPU). Capitalizing on the maximum computational power of such architectures (i.e., by simultaneously exploiting both the multi-core CPU and the GPU), starting from a high-level API, is a critical challenge. We believe that it would be highly desirable to support a simple way for programmers to realize the full potential of today's heterogeneous machines. This paper describes a compiler and runtime framework that can map a class of applications, namely those characterized by generalized reductions, to a system with a multi-core CPU and GPU. Starting with simple C functions with added annotations, we automatically generate the middleware API code for the multi-core, as well as CUDA code to exploit the GPU simultaneously. The runtime system provides efficient schemes for dynamically partitioning the work between CPU cores and the GPU. Our experimental results from two applications, for example, k-means clustering and principal component analysis, show that, through effectively harnessing the heterogeneous architecture, we can achieve significantly higher performance compared with using only the GPU or the multi-core CPU. In k-means clustering, the heterogeneous version with eight CPU cores and a GPU achieved a speedup of about 32.09x relative to one-thread CPU. When compared with the faster of CPU-only and GPU-only executions, we were able to achieve a performance gain of about 60%. In principal component analysis, the heterogeneous version attained a speedup of 10.4x relative to the one-thread CPU version. When compared with the faster of CPU-only and GPU-only versions, the heterogeneous version achieved a performance gain of about 63.8%. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
收藏
页码:463 / 480
页数:18
相关论文
共 50 条
  • [1] Dandelion: a Compiler and Runtime for Heterogeneous Systems
    Rossbach, Christopher J.
    Yu, Yuan
    Currey, Jon
    Martin, Jean-Philippe
    Fetterly, Dennis
    SOSP'13: PROCEEDINGS OF THE TWENTY-FOURTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, 2013, : 49 - 68
  • [2] Compiler and runtime support for adaptive sparse computations on a multithreaded architecture
    Zoppetti, GM
    Agrawal, G
    PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, 2002, : 488 - 493
  • [3] A Compiler and Runtime for Heterogeneous Computing
    Auerbach, Joshua
    Bacon, David F.
    Burcea, Ioana
    Cheng, Perry
    Fink, Stephen J.
    Rabbah, Rodric
    Shukla, Sunil
    2012 49TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2012, : 271 - 276
  • [4] Compiler and Runtime Support for Continuation Marks
    Flatt, Matthew
    Dybvig, R. Kent
    PROCEEDINGS OF THE 41ST ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '20), 2020, : 45 - 58
  • [5] A Compiler and Runtime System for Enabling Data Mining Applications on GPUs
    Ma, Wenjing
    Agrawal, Gagan
    ACM SIGPLAN NOTICES, 2009, 44 (04) : 287 - 288
  • [6] Compiler Support for Sparse Tensor Computations in MLIR
    Bik, Aart
    Koanantakool, Penporn
    Shpeisman, Tatiana
    Vasilache, Nicolas
    Zheng, Bixia
    Kjolstad, Fredrik
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (04)
  • [7] Compiler and runtime support for efficient software transactional memory
    Adl-Tabatabai, Ali-Reza
    Lewis, Brian T.
    Menon, Vijay
    Murphy, Brian R.
    Saha, Bratin
    Shpeisman, Tatiana
    ACM SIGPLAN NOTICES, 2006, 41 (06) : 26 - 37
  • [8] Enabling FPGA support in MATLAB based Heterogeneous Systems
    Skalicky, Sam
    Kwolek, Tyler
    Lopez, Sonia
    Lukowiak, Marcin
    2014 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2014,
  • [9] CHARE KERNEL - A RUNTIME SUPPORT SYSTEM FOR PARALLEL COMPUTATIONS
    SHU, W
    KALE, LV
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1991, 11 (03) : 198 - 211
  • [10] Pinpoint the Joules: Unifying Runtime-Support for Energy Measurements on Heterogeneous Systems
    Koehler, Sven
    Herzog, Benedict
    Hoenig, Timo
    Wenzel, Lukas
    Plauth, Max
    Nolte, Joerg
    Polze, Andreas
    Schroeder-Preikschat, Wolfgang
    PROCEEDINGS OF 2020 10TH IEEE/ACM INTERNATIONAL WORKSHOP ON RUNTIME AND OPERATING SYSTEMS FOR SUPERCOMPUTERS (ROSS 2020), 2020, : 31 - 40