A Case Study for Performance Portability Using OpenMP 4.5

被引:19
|
作者
Gayatri, Rahulkumar [1 ]
Yang, Charlene [1 ]
Kurth, Thorsten [1 ]
Deslippe, Jack [1 ]
机构
[1] Lawrence Berkeley Natl Lab LBNL, Natl Energy Res Sci Comp Ctr NERSC, Berkeley, CA 94720 USA
来源
ACCELERATOR PROGRAMMING USING DIRECTIVES | 2019年 / 11381卷
关键词
OpenMP; 3.0; 4.5; OpenACC; CUDA; Parallel programming models; P100; V100; Xeon Phi; Haswell;
D O I
10.1007/978-3-030-12274-4_4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the HPC landscape has shifted away from traditional multi-core CPU systems to energy-efficient architectures, such as many-core CPUs and accelerators like GPUs, to achieve high performance. The goal of performance portability is to enable developers to rapidly produce applications which can run efficiently on a variety of these architectures, with little to no architecture specific code adoptions required. We implement a key kernel from a material science application using OpenMP 3.0, OpenMP 4.5, OpenACC, and CUDA on Intel architectures, Xeon and Xeon Phi, and NVIDIA GPUs, P100 and V100. We will compare the performance of the OpenMP 4.5 implementation with that of the more architecture-specific implementations, examine the performance of the OpenMP 4.5 implementation on CPUs after back-porting, and share our experience optimizing large reduction loops, as well as discuss the latest compiler status for OpenMP 4.5 and OpenACC.
引用
收藏
页码:75 / 95
页数:21
相关论文
共 50 条
  • [21] Performance of Parallel Algorithms Using OpenMP
    Mego, Roman
    Fryza, Tomas
    2013 23RD INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2013, : 236 - 239
  • [22] Productivity and Performance Portability of the OpenMP 3.0 Tasking Concept When Applied to an Engineering Code Written in Fortran 95
    Paul Kapinos
    Dieter an Mey
    International Journal of Parallel Programming, 2010, 38 : 379 - 395
  • [23] Productivity and Performance Portability of the OpenMP 3.0 Tasking Concept When Applied to an Engineering Code Written in Fortran 95
    Kapinos, Paul
    an Mey, Dieter
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2010, 38 (5-6) : 379 - 395
  • [24] Early Experiences Porting Three Applications to OpenMP 4.5
    Karlin, Ian
    Scogland, Tom
    Jacob, Arpith C.
    Antao, Samuel F.
    Bercea, Gheorghe-Teodor
    Bertolli, Carlo
    de Supinski, Bronis R.
    Draeger, Erik W.
    Eichenberger, Alexandre E.
    Glosli, Jim
    Jones, Holger
    Kunen, Adam
    Poliakoff, David
    Richards, David F.
    OPENMP: MEMORY, DEVICES, AND TASKS, 2016, 9903 : 281 - 292
  • [25] OpenMP 4.5 Validation and Verification Suite for Device Offload
    Diaz, Jose Monsalve
    Pophale, Swaroop
    Hernandez, Oscar
    Bernholdt, David E.
    Chandrasekaran, Sunita
    EVOLVING OPENMP FOR EVOLVING ARCHITECTURES, 2018, 11128 : 82 - 95
  • [26] Analysis of OpenMP 4.5 Offloading in Implementations: Correctness and Overhead
    Diaz, Jose Monsalve
    Friedline, Kyle
    Pophale, Swaroop
    Hernandez, Oscar
    Bernholdt, David E.
    Chandrasekaran, Sunita
    PARALLEL COMPUTING, 2019, 89
  • [27] A Comparative Study on Performance Benefits of Multi-core CPUs using OpenMP
    Saravanan, Vijayalakshmi
    Radhakrishnan, Mohan
    Basavesh, A.S.
    Kothari, D.P.
    International Journal of Computer Science Issues, 2012, 9 (1 1-2) : 272 - 278
  • [28] DESIGN LIBRARY PORTABILITY - A CASE-STUDY
    CONQ, B
    ETIENNE, R
    PEREZSEGOVIA, T
    IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1993, 22 : 427 - 436
  • [29] A Study of Parallelization and Performance Optimizations Based on OpenMP
    Shen, Hua
    Zhou, Guoshun
    Yan, HuiQi
    MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 2933 - 2937
  • [30] Portability efficiency approach for calculating performance portability
    Marowka, Ami
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 170