The TRegion Interface and Compiler Optimizations for OPENMP Target Regions

被引：8

作者：

Doerfert, Johannes ^{[1
]}

Diaz, Jose Manuel Monsalve ^{[1
]}

Finkel, Hal ^{[1
]}

机构：

[1] Argonne Natl Lab, Argonne Leadership Comp Facil, Argonne, IL 60439 USA

来源：

OPENMP: CONQUERING THE FULL HARDWARE SPECTRUM, IWOMP 2019 | 2019年 / 11718卷

关键词：

Compiler optimizations; GPU; Accelerator offloading;

D O I：

10.1007/978-3-030-28596-8_11

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

OPENMP is a well established, single-source programming language extension to introduce parallelism into (historically) sequential base languages, namely C/C++ and Fortran. To program not only multi-core CPUs but also many-cores and heavily parallel accelerators, OPENMP 4.0 adopted a flexible offloading scheme inspired by the hierarchy in many GPU designs. The flexible design of the offloading scheme allows to use it in various application scenarios. However, it may also result in a significant performance loss, especially because OPENMP semantics is traditionally interpreted solely in the language front-end as a way to avoid problems with the "sequential-execution-minded" optimization pipeline. Given the limited analysis and transformation capabilities in a modern compiler front-end, the actual syntax used for OPENMP offloading can substantially impact the observed performance. The compiler front-end will always have to favor correct but overly conservative code, if certain facts are not syntactically obvious. \ In this work, we investigate how we can delay (target specific) implementation decisions currently taken early during the compilation of OPENMP offloading code. We prototyped our solution in LLVM/Clang, an industrial strength OPENMP compiler, to show that we can use semantic source code analyses as a rational instead of relying on the user provided syntax. Our preliminary results on the rather simple Rodinia benchmarks already show speedups of up to 1.55x.

引用

页码：153 / 167

页数：15

共 50 条

[31] Compiler optimizations for processors with SIMD instructions
Pryanishnikov, Ivan
Krall, Andreas
Horspool, Nigel
SOFTWARE-PRACTICE & EXPERIENCE, 2007, 37 (01): : 93 - 113
[32] Tuning compiler optimizations for simultaneous multithreading
Lo, JL
Eggers, SJ
Levy, HM
Parekh, SS
Tullsen, DM
THIRTIETH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 1997, : 114 - 124
[33] Influence of compiler optimizations on value prediction
Sato, T
Hamano, A
Sugitani, K
Arita, I
HIGH-PERFORMANCE COMPUTING AND NETWORKING, 2001, 2110 : 312 - 321
[34] Generating Compiler Optimizations from Proofs
Tate, Ross
Stepp, Michael
Lerner, Sorin
ACM SIGPLAN NOTICES, 2010, 45 (01) : 389 - 402
[35] Advanced Compiler Optimizations for Sparse Computations
J Parallel Distrib Comput, (14):
[36] Influence of compiler optimizations on system power
Kandemir, M
Vijaykrishnan, N
Irwin, MJ
Ye, W
37TH DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2000, 2000, : 304 - 307
[37] ADVANCED COMPILER OPTIMIZATIONS FOR SPARSE COMPUTATIONS
BIK, AJC
WIJSHOFF, HAG
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 31 (01) : 14 - 24
[38] COMP: Compiler Optimizations for Manycore Processors
Song, Linhai
Feng, Min
Ravi, Nishkam
Yang, Yi
Chakradhar, Srimat
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 659 - 671
[39] COMPILER OPTIMIZATIONS FOR IMPROVING DATA LOCALITY
CARR, S
MCKINLEY, KS
TSENG, CW
SIGPLAN NOTICES, 1994, 29 (11): : 252 - 262
[40] Effect of compiler optimizations on memory energy
Kim, HS
Irwin, MJ
Vijaykrishnan, N
Kandemir, M
2000 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: DESIGN AND IMPLEMENTATION, 2000, : 663 - 672

← 1 2 3 4 5 →