The TRegion Interface and Compiler Optimizations for OPENMP Target Regions

被引：8

作者：

Doerfert, Johannes ^{[1
]}

Diaz, Jose Manuel Monsalve ^{[1
]}

Finkel, Hal ^{[1
]}

机构：

[1] Argonne Natl Lab, Argonne Leadership Comp Facil, Argonne, IL 60439 USA

来源：

OPENMP: CONQUERING THE FULL HARDWARE SPECTRUM, IWOMP 2019 | 2019年 / 11718卷

关键词：

Compiler optimizations; GPU; Accelerator offloading;

D O I：

10.1007/978-3-030-28596-8_11

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

OPENMP is a well established, single-source programming language extension to introduce parallelism into (historically) sequential base languages, namely C/C++ and Fortran. To program not only multi-core CPUs but also many-cores and heavily parallel accelerators, OPENMP 4.0 adopted a flexible offloading scheme inspired by the hierarchy in many GPU designs. The flexible design of the offloading scheme allows to use it in various application scenarios. However, it may also result in a significant performance loss, especially because OPENMP semantics is traditionally interpreted solely in the language front-end as a way to avoid problems with the "sequential-execution-minded" optimization pipeline. Given the limited analysis and transformation capabilities in a modern compiler front-end, the actual syntax used for OPENMP offloading can substantially impact the observed performance. The compiler front-end will always have to favor correct but overly conservative code, if certain facts are not syntactically obvious. \ In this work, we investigate how we can delay (target specific) implementation decisions currently taken early during the compilation of OPENMP offloading code. We prototyped our solution in LLVM/Clang, an industrial strength OPENMP compiler, to show that we can use semantic source code analyses as a rational instead of relying on the user provided syntax. Our preliminary results on the rather simple Rodinia benchmarks already show speedups of up to 1.55x.

引用

页码：153 / 167

页数：15

共 50 条

[21] OpenMP compiler for distributed memory architectures
Jue Wang
ChangJun Hu
JiLin Zhang
JianJiang Li
Science China Information Sciences, 2010, 53 : 932 - 944
[22] Performance evaluation of the Omni OpenMP compiler
Kusano, K
Satoh, S
Sato, M
HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2000, 1940 : 403 - 414
[23] A practical OpenMP compiler for system on chips
Liu, F
Chaudhary, V
OPENMP SHARED MEMORY PARALLEL PROGRAMMING, 2003, 2716 : 54 - 68
[24] OpenMP compiler for distributed memory architectures
Wang Jue
Hu ChangJun
Zhang JiLin
Li JianJiang
SCIENCE CHINA-INFORMATION SCIENCES, 2010, 53 (05) : 932 - 944
[25] OMPCUDA : OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler
Ohshima, Satoshi
Hirasawa, Shoichi
Honda, Hiroki
BEYOND LOOP LEVEL PARALLELISM IN OPENMP: ACCELERATORS, TASKING AND MORE, PROCEEDINGS, 2010, 6132 : 161 - +
[26] Compiler optimizations to reduce security overhead
Zhang, Tao
Zhuang, Xiaotong
Pande, Santosh
CGO 2006: 4TH INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2006, : 346 - +
[27] Automatically proving the correctness of compiler optimizations
Lerner, S
Millstein, T
Chambers, C
ACM SIGPLAN NOTICES, 2003, 38 (05) : 220 - 231
[28] An Automatic Tool for Tuning Compiler Optimizations
Plotnikov, Dmitry
Melnik, Dmitry
Vardanyan, Mamikon
Buchatskiy, Ruben
Zhuykov, Roman
2013 COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES (CSIT), 2013,
[29] Weakest Precondition Synthesis for Compiler Optimizations
Lopes, Nuno P.
Monteiro, Jose
VERIFICATION, MODEL CHECKING, AND ABSTRACT INTERPRETATION: (VMCAI 2014), 2014, 8318 : 203 - 221
[30] Compiler optimizations for the PA-8000
Holler, AM
IEEE COMPCON 97, PROCEEDINGS, 1997, : 87 - 94

← 1 2 3 4 5 →