Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

被引:0
|
作者
Sourouri, Mohammed [1 ,2 ]
Baden, Scott B. [3 ]
Cai, Xing [1 ,2 ]
机构
[1] Simula Res Lab, Oslo, Norway
[2] Univ Oslo, Dept Informat, Oslo, Norway
[3] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
关键词
Source-to-source translation; Code generation; Code optimization; CUDA; OpenMP; MPI; Stencil computation; Heterogeneous computing; CPU plus GPU computing; CODE;
D O I
10.1007/s10766-016-0454-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPICUDAOpenMP code that uses concurrent CPUGPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90 % of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.
引用
收藏
页码:711 / 729
页数:19
相关论文
共 50 条
  • [31] GPU-accelerated 3D Bayesian image reconstruction from Compton scattered data
    Van-Giang Nguyen
    Lee, Soo-Jin
    Lee, Mi No
    PHYSICS IN MEDICINE AND BIOLOGY, 2011, 56 (09): : 2817 - 2836
  • [32] GPU-Accelerated 3D Mesh Deformation for Optimization Based on the Finite Element Method
    Lamecki, Adam
    Dziekonski, Adam
    Balewski, Lukasz
    Fotyga, Grzegorz
    Mrozowski, Michal
    RADIOENGINEERING, 2017, 26 (04) : 924 - 929
  • [33] The Impact of 3D Stacking on GPU-Accelerated Deep Neural Networks: an Experimental Study
    Wahby, William
    Sarvey, Thomas
    Sharma, Hardik
    Esmaeilzadeh, Hadi
    Bakir, Muhannad S.
    2016 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC), 2016,
  • [34] A GPU-Accelerated Method for 3D Nonlinear Kelvin Ship Wake Patterns Simulation
    Sun, Xiaofeng
    Cai, Miaoyu
    Ding, Junchen
    APPLIED SCIENCES-BASEL, 2023, 13 (22):
  • [35] GPU-accelerated 3D volumetric X-ray-induced acoustic computed tomography
    Lee, Donghyun
    Park, Eun-Yeong
    Choi, Seongwook
    Kim, Hyeongsub
    Min, Jung-joon
    Lee, Changho
    Kim, Chulhong
    BIOMEDICAL OPTICS EXPRESS, 2020, 11 (02) : 752 - 761
  • [36] 3D Radiative Transfer for Exoplanet Atmospheres. gCMCRT: A GPU-accelerated MCRT Code
    Lee, Elspeth K. H.
    Wardenier, Joost P.
    Prinoth, Bibiana
    Parmentier, Vivien
    Grimm, Simon L.
    Baeyens, Robin
    Carone, Ludmila
    Christie, Duncan
    Deitrick, Russell
    Kitzmann, Daniel
    Mayne, Nathan
    Roman, Michael
    Thorsbro, Brian
    ASTROPHYSICAL JOURNAL, 2022, 929 (02):
  • [37] Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
    徐琪
    余纲林
    王侃
    孙嘉龙
    Nuclear Science and Techniques, 2014, 25 (01) : 61 - 65
  • [38] GPU-accelerated real-time 3D tracking for humanoid locomotion and stair climbing
    Michel, Philipp
    Chestnutt, Joel
    Kagami, Satoshi
    Nishiwaki, Koichi
    Kuffner, James
    Kanade, Takeo
    2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 469 - +
  • [39] GPU-accelerated Computation of 3D laser radar range imaging of arbitrary coarse targets
    Lin, Jiaxuan
    Wu, Zhensen
    Su, Xiang
    Wu, Jiaji
    Wang, Biao
    Cao, Yunhua
    PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 868 - 872
  • [40] Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
    Xu Qi
    Yu Gang-Lin
    Wang Kan
    Sun Jia-Long
    NUCLEAR SCIENCE AND TECHNIQUES, 2014, 25 (01)