Compiler and runtime support for enabling reduction computations on heterogeneous systems

被引:3
|
作者
Ravi, Vignesh T. [1 ]
Ma, Wenjing [1 ]
Chiu, David [1 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2012年 / 24卷 / 05期
关键词
D O I
10.1002/cpe.1848
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Presently, it has become very common for a desktop or a notebook computer to come equipped with both a multi-core CPU and a graphics processing unit (GPU). Capitalizing on the maximum computational power of such architectures (i.e., by simultaneously exploiting both the multi-core CPU and the GPU), starting from a high-level API, is a critical challenge. We believe that it would be highly desirable to support a simple way for programmers to realize the full potential of today's heterogeneous machines. This paper describes a compiler and runtime framework that can map a class of applications, namely those characterized by generalized reductions, to a system with a multi-core CPU and GPU. Starting with simple C functions with added annotations, we automatically generate the middleware API code for the multi-core, as well as CUDA code to exploit the GPU simultaneously. The runtime system provides efficient schemes for dynamically partitioning the work between CPU cores and the GPU. Our experimental results from two applications, for example, k-means clustering and principal component analysis, show that, through effectively harnessing the heterogeneous architecture, we can achieve significantly higher performance compared with using only the GPU or the multi-core CPU. In k-means clustering, the heterogeneous version with eight CPU cores and a GPU achieved a speedup of about 32.09x relative to one-thread CPU. When compared with the faster of CPU-only and GPU-only executions, we were able to achieve a performance gain of about 60%. In principal component analysis, the heterogeneous version attained a speedup of 10.4x relative to the one-thread CPU version. When compared with the faster of CPU-only and GPU-only versions, the heterogeneous version achieved a performance gain of about 63.8%. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
收藏
页码:463 / 480
页数:18
相关论文
共 50 条
  • [31] Automatic Partitioning of Stencil Computations on Heterogeneous Systems
    Pereira, Alyson D.
    Rocha, Rodrigo C. O.
    Ramos, Luiz
    Castro, Marcio
    Goes, Luis F. W.
    2017 INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2017, : 43 - 48
  • [32] HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations
    Zheng, Zhen
    Oh, Chanyoung
    Zhai, Jidong
    Shen, Xipeng
    Yi, Youngmin
    Chen, Wenguang
    TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV), 2019, : 153 - 166
  • [33] Enabling run-time composition and support for heterogeneous pervasive multi-agent systems
    Jayaputera, G. T.
    Zaslavsky, A.
    Loke, S. W.
    JOURNAL OF SYSTEMS AND SOFTWARE, 2007, 80 (12) : 2039 - 2062
  • [34] Enabling Failure-Resilient Intermittent Systems Without Runtime Checkpointing
    Chen, Wei-Ming
    Kuo, Tei-Wei
    Hsiu, Pi-Cheng
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (12) : 4399 - 4412
  • [35] A novel compiler support for automatic parallelization on multicore systems
    Andion, Jose M.
    Arenaz, Manuel
    Rodriguez, Gabriel
    Tourino, Juan
    PARALLEL COMPUTING, 2013, 39 (09) : 442 - 460
  • [36] Compiler-assisted Data Placement for Heterogeneous Memory Systems
    Kim, Hwajung
    IEICE ELECTRONICS EXPRESS, 2024, 21 (19):
  • [37] A Parallelizing Matlab Compiler Framework and Run time for Heterogeneous Systems
    Skalicky, Sam
    Lopez, Sonia
    Lukowiak, Marcin
    Schmidt, Andrew G.
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 232 - 237
  • [38] MOLAR: Adaptive runtime support for high-end computing operating and runtime systems
    Engelmann, Christian
    Scott, Stephen L.
    Bernholdt, David E.
    Gottumukkala, Narasimha R.
    Leangsuksun, Chokchai
    Varma, Jyothish
    Wang, Chao
    Mueller, Frank
    Shet, Aniruddha G.
    Sadayappan, P.
    Operating Systems Review (ACM), 2006, 40 (02): : 63 - 72
  • [39] Runtime Techniques for Efficient Ray-Tracing on Heterogeneous Systems
    Kao, Chih-Chen
    Ilsu, Wei-Chung
    2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, : 100 - 104
  • [40] Heterogeneous Managed Runtime Systems: A Computer Vision Case Study
    Kotselidis, Christos
    Clarkson, James
    Rodchenko, Andrey
    Nisbet, Andy
    Mawer, John
    Lujan, Mikel
    ACM SIGPLAN NOTICES, 2017, 52 (07) : 74 - 82