OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization

被引:167
|
作者
Lee, Seyong [1 ]
Min, Seung-Jai [1 ]
Eigenmann, Rudolf [1 ]
机构
[1] Purdue Univ, Sch ECE, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
Algorithms; Design; Performance; OpenMP; GPU; CUDA; Automatic Translation; Compiler Optimization; PROGRAMS;
D O I
10.1145/1594835.1504194
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
GPGPUs have recently emerged as powerful vehicles for general-purpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from NVIDIA offers improved programmability for general computing, programming GPGPUs is still complex and error-prone. This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs. In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial on a CPU).
引用
收藏
页码:101 / 110
页数:10
相关论文
共 50 条
  • [21] OpenUH: an optimizing, portable OpenMP compiler
    Liao, Chunhua
    Hernandez, Oscar
    Chapman, Barbara
    Chen, Wenguang
    Zheng, Weimin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (18): : 2317 - 2332
  • [22] OpenMP compiler for distributed memory architectures
    Jue Wang
    ChangJun Hu
    JiLin Zhang
    JianJiang Li
    Science China Information Sciences, 2010, 53 : 932 - 944
  • [23] Loading OpenMP to Cell: An effective compiler framework for heterogeneous multi-core chip
    Wei, Haitao
    Yu, Junqing
    PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 129 - 133
  • [24] Performance evaluation of the Omni OpenMP compiler
    Kusano, K
    Satoh, S
    Sato, M
    HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2000, 1940 : 403 - 414
  • [25] A practical OpenMP compiler for system on chips
    Liu, F
    Chaudhary, V
    OPENMP SHARED MEMORY PARALLEL PROGRAMMING, 2003, 2716 : 54 - 68
  • [26] OpenMP compiler for distributed memory architectures
    Wang Jue
    Hu ChangJun
    Zhang JiLin
    Li JianJiang
    SCIENCE CHINA-INFORMATION SCIENCES, 2010, 53 (05) : 932 - 944
  • [27] An Automatic Compiler Optimizations Selection Framework for Embedded Applications
    Hung, Shih-Hao
    Tu, Chia-Heng
    Lin, Huang-Sen
    Chen, Chi-Meng
    2009 INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, PROCEEDINGS, 2009, : 381 - +
  • [28] Lost In Translation: Exposing Hidden Compiler Optimization Opportunities
    Georgiou, Kyriakos
    Chamski, Zbigniew
    Garcia, Andres Amaya
    May, David
    Eder, Kerstin
    COMPUTER JOURNAL, 2022, 65 (03): : 718 - 735
  • [29] Automatic port to OpenACC/OpenMP for physical parameterization in climate and weather code using the CLAW compiler
    Clement V.
    Marti P.
    Lapillonne X.
    Fuhrer O.
    Sawyer W.
    Supercomputing Frontiers and Innovations, 2019, 6 (03) : 51 - 63
  • [30] ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization
    Mosseri, Idan
    Alon, Lee-Or
    Harel, Re'Em
    Oren, Gal
    OPENMP: PORTABLE MULTI-LEVEL PARALLELISM ON MODERN SYSTEMS, 2020, 12295 : 247 - 262