A Case Study for Performance Portability Using OpenMP 4.5

被引:19
|
作者
Gayatri, Rahulkumar [1 ]
Yang, Charlene [1 ]
Kurth, Thorsten [1 ]
Deslippe, Jack [1 ]
机构
[1] Lawrence Berkeley Natl Lab LBNL, Natl Energy Res Sci Comp Ctr NERSC, Berkeley, CA 94720 USA
来源
关键词
OpenMP; 3.0; 4.5; OpenACC; CUDA; Parallel programming models; P100; V100; Xeon Phi; Haswell;
D O I
10.1007/978-3-030-12274-4_4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the HPC landscape has shifted away from traditional multi-core CPU systems to energy-efficient architectures, such as many-core CPUs and accelerators like GPUs, to achieve high performance. The goal of performance portability is to enable developers to rapidly produce applications which can run efficiently on a variety of these architectures, with little to no architecture specific code adoptions required. We implement a key kernel from a material science application using OpenMP 3.0, OpenMP 4.5, OpenACC, and CUDA on Intel architectures, Xeon and Xeon Phi, and NVIDIA GPUs, P100 and V100. We will compare the performance of the OpenMP 4.5 implementation with that of the more architecture-specific implementations, examine the performance of the OpenMP 4.5 implementation on CPUs after back-porting, and share our experience optimizing large reduction loops, as well as discuss the latest compiler status for OpenMP 4.5 and OpenACC.
引用
收藏
页码:75 / 95
页数:21
相关论文
共 50 条
  • [1] The design and implementation of OpenMP 4.5 and OpenACC backends for the RAJA C++ performance portability layer
    Killian, William
    Scogland, Tom
    Kunen, Adam
    Cavazos, John
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, 10732 LNCS : 63 - 82
  • [2] The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C plus plus Performance Portability Layer
    Killian, William
    Scogland, Tom
    Kunen, Adam
    Cavazos, John
    ACCELERATOR PROGRAMMING USING DIRECTIVES, WACCPD 2017, 2018, 10732 : 63 - 82
  • [3] Enhancing OpenMP Tasking Model: Performance and Portability
    Yu, Chenle
    Royuela, Sara
    Quinones, Eduardo
    OPENMP: ENABLING MASSIVE NODE-LEVEL PARALLELISM, IWOMP 2021, 2021, 12870 : 35 - 49
  • [4] On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA
    Marowka, Ami
    ACM International Conference Proceeding Series, 2022, : 103 - 114
  • [5] The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs
    Martineau, Matt
    McIntosh-Smith, Simon
    SCALING OPENMP FOR EXASCALE PERFORMANCE AND PORTABILITY (IWOMP 2017), 2017, 10468 : 185 - 200
  • [6] Pragmatic Performance Portability with OpenMP 4.x
    Martineau, Matt
    Price, James
    McIntosh-Smith, Simon
    Gaudin, Wayne
    OPENMP: MEMORY, DEVICES, AND TASKS, 2016, 9903 : 253 - 267
  • [7] Performance portability of sparse matrix-vector multiplication implemented using OpenMP, OpenACC and SYCL
    Stec, Kinga
    Stpiczynski, Przemyslaw
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 170
  • [8] Evaluating Performance Portability of OpenMP for SNAP on NVIDIA, Intel, and AMD GPUs Using the Roofline Methodology
    Mehta, Neil A.
    Gayatri, Rahulkumar
    Ghadar, Yasaman
    Knight, Christopher
    Deslippe, Jack
    ACCELERATOR PROGRAMMING USING DIRECTIVES, WACCPD 2020, 2021, 12655 : 3 - 24
  • [9] Evaluating the Impact of Proposed OpenMP 5.0 Features on Performance, Portability and Productivity
    Pennycook, S. J.
    Sewall, J. D.
    Hammond, J. R.
    PROCEEDINGS OF 2018 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2018), 2018, : 37 - 46
  • [10] A Performance Portability Study Using Tensor Contraction Benchmarks
    Ozturk, M. Emin
    Asudeh, Omid
    Sabin, Gerald
    Sadayappan, P.
    Sukumaran-Rajam, Aravind
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 591 - 600