The TRegion Interface and Compiler Optimizations for OPENMP Target Regions

被引:8
|
作者
Doerfert, Johannes [1 ]
Diaz, Jose Manuel Monsalve [1 ]
Finkel, Hal [1 ]
机构
[1] Argonne Natl Lab, Argonne Leadership Comp Facil, Argonne, IL 60439 USA
关键词
Compiler optimizations; GPU; Accelerator offloading;
D O I
10.1007/978-3-030-28596-8_11
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
OPENMP is a well established, single-source programming language extension to introduce parallelism into (historically) sequential base languages, namely C/C++ and Fortran. To program not only multi-core CPUs but also many-cores and heavily parallel accelerators, OPENMP 4.0 adopted a flexible offloading scheme inspired by the hierarchy in many GPU designs. The flexible design of the offloading scheme allows to use it in various application scenarios. However, it may also result in a significant performance loss, especially because OPENMP semantics is traditionally interpreted solely in the language front-end as a way to avoid problems with the "sequential-execution-minded" optimization pipeline. Given the limited analysis and transformation capabilities in a modern compiler front-end, the actual syntax used for OPENMP offloading can substantially impact the observed performance. The compiler front-end will always have to favor correct but overly conservative code, if certain facts are not syntactically obvious. \ In this work, we investigate how we can delay (target specific) implementation decisions currently taken early during the compilation of OPENMP offloading code. We prototyped our solution in LLVM/Clang, an industrial strength OPENMP compiler, to show that we can use semantic source code analyses as a rational instead of relying on the user provided syntax. Our preliminary results on the rather simple Rodinia benchmarks already show speedups of up to 1.55x.
引用
收藏
页码:153 / 167
页数:15
相关论文
共 50 条
  • [21] OpenMP compiler for distributed memory architectures
    Jue Wang
    ChangJun Hu
    JiLin Zhang
    JianJiang Li
    Science China Information Sciences, 2010, 53 : 932 - 944
  • [22] Performance evaluation of the Omni OpenMP compiler
    Kusano, K
    Satoh, S
    Sato, M
    HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2000, 1940 : 403 - 414
  • [23] A practical OpenMP compiler for system on chips
    Liu, F
    Chaudhary, V
    OPENMP SHARED MEMORY PARALLEL PROGRAMMING, 2003, 2716 : 54 - 68
  • [24] OpenMP compiler for distributed memory architectures
    Wang Jue
    Hu ChangJun
    Zhang JiLin
    Li JianJiang
    SCIENCE CHINA-INFORMATION SCIENCES, 2010, 53 (05) : 932 - 944
  • [25] OMPCUDA : OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler
    Ohshima, Satoshi
    Hirasawa, Shoichi
    Honda, Hiroki
    BEYOND LOOP LEVEL PARALLELISM IN OPENMP: ACCELERATORS, TASKING AND MORE, PROCEEDINGS, 2010, 6132 : 161 - +
  • [26] Compiler optimizations to reduce security overhead
    Zhang, Tao
    Zhuang, Xiaotong
    Pande, Santosh
    CGO 2006: 4TH INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2006, : 346 - +
  • [27] Automatically proving the correctness of compiler optimizations
    Lerner, S
    Millstein, T
    Chambers, C
    ACM SIGPLAN NOTICES, 2003, 38 (05) : 220 - 231
  • [28] An Automatic Tool for Tuning Compiler Optimizations
    Plotnikov, Dmitry
    Melnik, Dmitry
    Vardanyan, Mamikon
    Buchatskiy, Ruben
    Zhuykov, Roman
    2013 COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES (CSIT), 2013,
  • [29] Weakest Precondition Synthesis for Compiler Optimizations
    Lopes, Nuno P.
    Monteiro, Jose
    VERIFICATION, MODEL CHECKING, AND ABSTRACT INTERPRETATION: (VMCAI 2014), 2014, 8318 : 203 - 221
  • [30] Compiler optimizations for the PA-8000
    Holler, AM
    IEEE COMPCON 97, PROCEEDINGS, 1997, : 87 - 94