OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization

被引:167
|
作者
Lee, Seyong [1 ]
Min, Seung-Jai [1 ]
Eigenmann, Rudolf [1 ]
机构
[1] Purdue Univ, Sch ECE, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
Algorithms; Design; Performance; OpenMP; GPU; CUDA; Automatic Translation; Compiler Optimization; PROGRAMS;
D O I
10.1145/1594835.1504194
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
GPGPUs have recently emerged as powerful vehicles for general-purpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from NVIDIA offers improved programmability for general computing, programming GPGPUs is still complex and error-prone. This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs. In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial on a CPU).
引用
收藏
页码:101 / 110
页数:10
相关论文
共 50 条
  • [1] Open MP to GPGPU: A compiler framework for automatic translation and optimization
    School of ECE, Purdue University, West Lafayette, IN, 47907, United States
    ACM SIGPLAN Not., 2009, 4 (101-110):
  • [2] Automatic Optimization of Thread Mapping for a GPGPU Programming Framework
    Ohno, Kazuhiko
    Kamiya, Tomoharu
    Maruyama, Takanori
    Matsumoto, Masaki
    2014 SECOND INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2014, : 198 - 204
  • [3] CCAMP: An Integrated Translation and Optimization Framework for OpenACC and OpenMP
    Lambert, Jacob
    Lee, Seyong
    Vetter, Jeffrey S.
    Malony, Allen D.
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [4] FAROS: A Framework to Analyze OpenMP Compilation Through Benchmarking and Compiler Optimization Analysis
    Georgakoudis, Giorgis
    Doerfert, Johannes
    Laguna, Ignacio
    Scogland, Thomas R. W.
    OPENMP: PORTABLE MULTI-LEVEL PARALLELISM ON MODERN SYSTEMS, 2020, 12295 : 3 - 17
  • [5] A GPGPU Compiler for Memory Optimization and Parallelism Management
    Yang, Yi
    Xiang, Ping
    Kong, Jingfei
    Zhou, Huiyang
    ACM SIGPLAN NOTICES, 2010, 45 (06) : 86 - 97
  • [6] A GPGPU Compiler for Memory Optimization and Parallelism Management
    Yang, Yi
    Xiang, Ping
    Kong, Jingfei
    Zhou, Huiyang
    PLDI '10: PROCEEDINGS OF THE 2010 ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION, 2010, : 86 - 97
  • [7] OMPCUDA : OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler
    Ohshima, Satoshi
    Hirasawa, Shoichi
    Honda, Hiroki
    BEYOND LOOP LEVEL PARALLELISM IN OPENMP: ACCELERATORS, TASKING AND MORE, PROCEEDINGS, 2010, 6132 : 161 - +
  • [8] A Unified Optimizing Compiler Framework for Different GPGPU Architectures
    Yang, Yi
    Xiang, Ping
    Kong, Jingfei
    Mantor, Mike
    Zhou, Huiyang
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 9 (02)
  • [9] Compiler optimization algorithm for OpenMP parallel program
    School of Information and Engineering, PLA Information and Engineering University, Zhengzhou 450002, China
    Jisuanji Gongcheng, 2006, 24 (37-40):
  • [10] A Dynamic Optimization Framework for OpenMP
    Wicaksono, Besar
    Nanjegowda, Ramachandra C.
    Chapman, Barbara
    OPENMP IN THE PETASCALE ERA, (IWOMP 2011), 2011, 6665 : 54 - 68