An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications

被引:4
|
作者
Zhou, Keren [1 ]
Meng, Xiaozhu [1 ]
Sai, Ryuichi [1 ]
Grubisic, Dejan [1 ]
Mellor-Crummey, John [1 ]
机构
[1] Rice Univ, Comp Sci Dept, Houston, TX 77054 USA
关键词
Graphics processing units; Optimization; Tools; Measurement; Instruments; Tuning; Registers; High performance computing; performance analysis; parallel programming; parallel architectures;
D O I
10.1109/TPDS.2021.3094169
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The US Department of Energy's fastest supercomputers and forthcoming exascale systems employ Graphics Processing Units (GPUs) to increase the computational performance of compute nodes. However, the complexity of GPU architectures makes tailoring sophisticated applications to achieve high performance on GPU-accelerated systems a major challenge. At best, prior performance tools for GPU code only provide coarse-grained tuning advice at the kernel level. In this article, we describe GPA, a performance advisor that suggests potential code optimizations at a hierarchy of levels, including individual lines, loops, and functions. To gather the fine-grained measurements needed to produce such insights, GPA uses instruction sampling and binary instrumentation to monitor execution of GPU code. At the time of this writing, GPU instruction sampling is only available on NVIDIA GPUs. To understand performance losses, GPA uses data flow analysis to approximately attribute measured instruction stalls back to their causes. GPA then analyzes patterns of stalls using information about a program's structure and the GPU architecture to identify optimization strategies that address inefficiencies observed. GPA then employs detailed performance models to estimate the potential speedup that each optimization might provide. Experiments with benchmarks and applications show that GPA provides useful advice for tuning GPU code. We applied GPA to analyze and tune a collection of codes on NVIDIA V100 and A100 GPUs. GPA suggested optimizations that it estimates will accelerate performance across the set of codes by a geometric mean of 1.21x. Applying these optimizations suggested by GPA accelerated these codes by a geometric mean of 1.19x.
引用
收藏
页码:854 / 865
页数:12
相关论文
共 50 条
  • [41] An adaptive mesh, GPU-accelerated, and error minimized special relativistic hydrodynamics code
    Tseng, Po-Hsun
    Schive, Hsi-Yu
    Chiueh, Tzihong
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2021, 504 (03) : 3298 - 3315
  • [42] GPUPEGAS: A NEW GPU-ACCELERATED HYDRODYNAMIC CODE FOR NUMERICAL SIMULATIONS OF INTERACTING GALAXIES
    Kulikov, Igor
    ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2014, 214 (01):
  • [43] Portability for GPU-accelerated molecular docking applications for cloud and HPC: can portable compiler directives provide performance across all platforms?
    Thavappiragasam, Mathialakan
    Elwasif, Wael
    Sedova, Ada
    2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 975 - 984
  • [44] Efficient MPI-based Communication for GPU-Accelerated Dask Applications
    Shafi, Aamir
    Hashmi, Jahanzeb Maqbool
    Subramoni, Hari
    Panda, Dhabaleswar K.
    21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 277 - 286
  • [45] Applications of GPU-accelerated replica exchange molecular dynamic simulations of proteins
    Wang, Kai
    Shirts, Michael R.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2012, 244
  • [46] GPU-Accelerated Progressive Gaussian Filtering with Applications to Extended Object Tracking
    Steinbring, Jannik
    Hanebeck, Uwe D.
    2015 18TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2015, : 1038 - 1045
  • [47] GPU-Accelerated Key Frame Analysis for Face Detection in Video
    Qi, Xuan
    Liu, Chen
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2015, : 600 - 605
  • [48] Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs
    Cherian, Aaron Thomas
    Zhou, Keren
    Grubisic, Dejan
    Meng, Xiaozhu
    Mellor-Crummey, John
    PROCEEDINGS OF WORKSHOP ON PROGRAMMING AND PERFORMANCE VISUALIZATION TOOLS (PROTOOLS 2021), 2021, : 26 - 35
  • [49] GPU-Accelerated Mahalanobis-Average Hierarchical Clustering Analysis
    Smelko, Adam
    Kratochvil, Miroslav
    Krulis, Martin
    Sieger, Tomas
    EURO-PAR 2021: PARALLEL PROCESSING, 2021, 12820 : 580 - 595
  • [50] GPU-Accelerated Parameter Selection for Neural Connectivity Analysis Devices
    O'Leary, Gerard
    Taras, Ian
    Stuart, Dylan Malone
    Koerner, Jamie
    Groppe, David M.
    Valiante, Taufik A.
    Genov, Roman
    2018 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (BIOCAS): ADVANCED SYSTEMS FOR ENHANCING HUMAN HEALTH, 2018, : 543 - 546