A Case Study for Performance Portability Using OpenMP 4.5

被引：19

作者：

Gayatri, Rahulkumar ^{[1
]}

Yang, Charlene ^{[1
]}

Kurth, Thorsten ^{[1
]}

Deslippe, Jack ^{[1
]}

机构：

[1] Lawrence Berkeley Natl Lab LBNL, Natl Energy Res Sci Comp Ctr NERSC, Berkeley, CA 94720 USA

来源：

ACCELERATOR PROGRAMMING USING DIRECTIVES | 2019年 / 11381卷

关键词：

OpenMP; 3.0; 4.5; OpenACC; CUDA; Parallel programming models; P100; V100; Xeon Phi; Haswell;

D O I：

10.1007/978-3-030-12274-4_4

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, the HPC landscape has shifted away from traditional multi-core CPU systems to energy-efficient architectures, such as many-core CPUs and accelerators like GPUs, to achieve high performance. The goal of performance portability is to enable developers to rapidly produce applications which can run efficiently on a variety of these architectures, with little to no architecture specific code adoptions required. We implement a key kernel from a material science application using OpenMP 3.0, OpenMP 4.5, OpenACC, and CUDA on Intel architectures, Xeon and Xeon Phi, and NVIDIA GPUs, P100 and V100. We will compare the performance of the OpenMP 4.5 implementation with that of the more architecture-specific implementations, examine the performance of the OpenMP 4.5 implementation on CPUs after back-porting, and share our experience optimizing large reduction loops, as well as discuss the latest compiler status for OpenMP 4.5 and OpenACC.

引用

页码：75 / 95

页数：21

共 50 条

[1] The design and implementation of OpenMP 4.5 and OpenACC backends for the RAJA C++ performance portability layer
Killian, William
Scogland, Tom
Kunen, Adam
Cavazos, John
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, 10732 LNCS : 63 - 82
[2] The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C plus plus Performance Portability Layer
Killian, William
Scogland, Tom
Kunen, Adam
Cavazos, John
ACCELERATOR PROGRAMMING USING DIRECTIVES, WACCPD 2017, 2018, 10732 : 63 - 82
[3] Enhancing OpenMP Tasking Model: Performance and Portability
Yu, Chenle
Royuela, Sara
Quinones, Eduardo
OPENMP: ENABLING MASSIVE NODE-LEVEL PARALLELISM, IWOMP 2021, 2021, 12870 : 35 - 49
[4] On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA
Marowka, Ami
ACM International Conference Proceeding Series, 2022, : 103 - 114
[5] The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs
Martineau, Matt
McIntosh-Smith, Simon
SCALING OPENMP FOR EXASCALE PERFORMANCE AND PORTABILITY (IWOMP 2017), 2017, 10468 : 185 - 200
[6] Pragmatic Performance Portability with OpenMP 4.x
Martineau, Matt
Price, James
McIntosh-Smith, Simon
Gaudin, Wayne
OPENMP: MEMORY, DEVICES, AND TASKS, 2016, 9903 : 253 - 267
[7] Performance portability of sparse matrix-vector multiplication implemented using OpenMP, OpenACC and SYCL
Stec, Kinga
Stpiczynski, Przemyslaw
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 170
[8] Evaluating Performance Portability of OpenMP for SNAP on NVIDIA, Intel, and AMD GPUs Using the Roofline Methodology
Mehta, Neil A.
Gayatri, Rahulkumar
Ghadar, Yasaman
Knight, Christopher
Deslippe, Jack
ACCELERATOR PROGRAMMING USING DIRECTIVES, WACCPD 2020, 2021, 12655 : 3 - 24
[9] Evaluating the Impact of Proposed OpenMP 5.0 Features on Performance, Portability and Productivity
Pennycook, S. J.
Sewall, J. D.
Hammond, J. R.
PROCEEDINGS OF 2018 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2018), 2018, : 37 - 46
[10] A Performance Portability Study Using Tensor Contraction Benchmarks
Ozturk, M. Emin
Asudeh, Omid
Sabin, Gerald
Sadayappan, P.
Sukumaran-Rajam, Aravind
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 591 - 600

← 1 2 3 4 5 →