OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization

被引:167
|
作者
Lee, Seyong [1 ]
Min, Seung-Jai [1 ]
Eigenmann, Rudolf [1 ]
机构
[1] Purdue Univ, Sch ECE, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
Algorithms; Design; Performance; OpenMP; GPU; CUDA; Automatic Translation; Compiler Optimization; PROGRAMS;
D O I
10.1145/1594835.1504194
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
GPGPUs have recently emerged as powerful vehicles for general-purpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from NVIDIA offers improved programmability for general computing, programming GPGPUs is still complex and error-prone. This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs. In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial on a CPU).
引用
收藏
页码:101 / 110
页数:10
相关论文
共 50 条
  • [41] A compiler for parallel Unity programs using OpenMp
    Couturier, R
    Couturier, B
    Méry, D
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, PROCEEDINGS, 1999, : 1992 - 1998
  • [42] Automatic granularity selection and OpenMP directive generation via Extended Machine Descriptors in the PROMIS parallelizing compiler
    Ko, Walden
    Polychronopoulos, Constantine D.
    OPENMP SHARED MEMORY PARALLEL PROGRAMMING, PROCEEDINGS, 2008, 4315 : 207 - +
  • [43] OpenMP 4.0 Device Support in the OMPi Compiler
    Papadogiannakis, Alexandros
    Agathos, Spiros N.
    Dimakopoulos, Vassilios V.
    OPENMP: HETEROGENOUS EXECUTION AND DATA MOVEMENTS, IWOMP 2015, 2015, 9342 : 202 - 216
  • [44] Nested parallelism in the OMPi OpenMP/C compiler
    Hadjidoukas, Panagiotis E.
    Dimakopoulos, Vassilios V.
    EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 662 - +
  • [45] Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading
    Wang, Farui
    Zhang, Weizhe
    Guo, Haonan
    Hao, Meng
    Lu, Gangzhao
    Wang, Zheng
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (05): : 4957 - 4987
  • [46] Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading
    Farui Wang
    Weizhe Zhang
    Haonan Guo
    Meng Hao
    Gangzhao Lu
    Zheng Wang
    The Journal of Supercomputing, 2021, 77 : 4957 - 4987
  • [47] A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs
    Baskaran, Muthu Manikandan
    Bondhugula, Uday
    Krishnamoorthy, Sriram
    Ramanujam, J.
    Rountev, Atanas
    Sadayappan, R.
    ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2008, : 225 - +
  • [48] BaCO: A Fast and Portable Bayesian Compiler Optimization Framework
    Hellsten, Erik
    Souza, Artur
    Lenfers, Johannes
    Lacouture, Rubens
    Hsu, Olivia
    Ejjeh, Adel
    Kjolstad, Fredrik
    Steuwer, Michel
    Olukotun, Kunle
    Nardi, Luigi
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2023, VOL 4, 2023, : 19 - 42
  • [49] A Scalable Auto-tuning Framework for Compiler Optimization
    Tiwari, Ananta
    Chen, Chun
    Chame, Jacqueline
    Hall, Mary
    Hollingsworth, Jeffrey K.
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 796 - +
  • [50] A Runtime Framework for GPGPU
    Lin, Shang-Chieh
    Hsu, Yarsun
    2014 SIXTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2014, : 19 - 22