Compiler and runtime support for enabling reduction computations on heterogeneous systems

被引:3
|
作者
Ravi, Vignesh T. [1 ]
Ma, Wenjing [1 ]
Chiu, David [1 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2012年 / 24卷 / 05期
关键词
D O I
10.1002/cpe.1848
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Presently, it has become very common for a desktop or a notebook computer to come equipped with both a multi-core CPU and a graphics processing unit (GPU). Capitalizing on the maximum computational power of such architectures (i.e., by simultaneously exploiting both the multi-core CPU and the GPU), starting from a high-level API, is a critical challenge. We believe that it would be highly desirable to support a simple way for programmers to realize the full potential of today's heterogeneous machines. This paper describes a compiler and runtime framework that can map a class of applications, namely those characterized by generalized reductions, to a system with a multi-core CPU and GPU. Starting with simple C functions with added annotations, we automatically generate the middleware API code for the multi-core, as well as CUDA code to exploit the GPU simultaneously. The runtime system provides efficient schemes for dynamically partitioning the work between CPU cores and the GPU. Our experimental results from two applications, for example, k-means clustering and principal component analysis, show that, through effectively harnessing the heterogeneous architecture, we can achieve significantly higher performance compared with using only the GPU or the multi-core CPU. In k-means clustering, the heterogeneous version with eight CPU cores and a GPU achieved a speedup of about 32.09x relative to one-thread CPU. When compared with the faster of CPU-only and GPU-only executions, we were able to achieve a performance gain of about 60%. In principal component analysis, the heterogeneous version attained a speedup of 10.4x relative to the one-thread CPU version. When compared with the faster of CPU-only and GPU-only versions, the heterogeneous version achieved a performance gain of about 63.8%. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
收藏
页码:463 / 480
页数:18
相关论文
共 50 条
  • [41] TOOCC: Enabling Heterogeneous Systems Interoperability in the Study of Energy Systems
    Teixeira, Brigida
    Silva, Francisco
    Pinto, Tiago
    Santos, Gabriel
    Praca, Isabel
    Vale, Zita
    2017 IEEE POWER & ENERGY SOCIETY GENERAL MEETING, 2017,
  • [42] Static Compiler Analyses for Application-specific Optimization of Task-Parallel Runtime Systems
    Peter Thoman
    Peter Zangerl
    Thomas Fahringer
    Journal of Signal Processing Systems, 2019, 91 : 303 - 320
  • [43] Static Compiler Analyses for Application-specific Optimization of Task-Parallel Runtime Systems
    Thoman, Peter
    Zangerl, Peter
    Fahringer, Thomas
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2019, 91 (3-4): : 303 - 320
  • [44] Cplant™ runtime system support for multi-processor and heterogeneous compute nodes
    Pedretti, K
    Brightwell, R
    Williams, J
    2002 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2002, : 207 - 214
  • [45] Autonomous Execution for Multi-GPU Systems: Compiler Support
    Koç University, Istanbul, Turkey
    不详
    CA, United States
    Proc. SC -W: Workshops Int. Conf. High Perform. Comput., Netw., Storage Anal., (1129-1140):
  • [46] MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems
    Yang XueJun
    Tang Tao
    Wang GuiBin
    Jia Jia
    Xu XinHai
    SCIENCE CHINA-INFORMATION SCIENCES, 2012, 55 (09) : 1961 - 1971
  • [47] MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems
    XueJun Yang
    Tao Tang
    GuiBin Wang
    Jia Jia
    XinHai Xu
    Science China Information Sciences, 2012, 55 : 1961 - 1971
  • [49] Novel runtime systems support for adaptive compositional modeling in PSEs
    Varadarajan, S
    Ramakrishnan, N
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2005, 21 (06): : 878 - 895
  • [50] Runtime support for reconfigurable real-time embedded systems
    Papp, Z
    IMTC/2001: PROCEEDINGS OF THE 18TH IEEE INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, VOLS 1-3: REDISCOVERING MEASUREMENT IN THE AGE OF INFORMATICS, 2001, : 2111 - 2116