Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

被引：0

作者：

Sourouri, Mohammed ^{[1
,2
]}

Baden, Scott B. ^{[3
]}

Cai, Xing ^{[1
,2
]}

机构：

[1] Simula Res Lab, Oslo, Norway

[2] Univ Oslo, Dept Informat, Oslo, Norway

[3] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA

来源：

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING | 2017年 / 45卷 / 03期

关键词：

Source-to-source translation; Code generation; Code optimization; CUDA; OpenMP; MPI; Stencil computation; Heterogeneous computing; CPU plus GPU computing; CODE;

D O I：

10.1007/s10766-016-0454-1

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPICUDAOpenMP code that uses concurrent CPUGPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90 % of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.

引用

页码：711 / 729

页数：19

共 50 条

[31] GPU-accelerated 3D Bayesian image reconstruction from Compton scattered data
Van-Giang Nguyen
Lee, Soo-Jin
Lee, Mi No
PHYSICS IN MEDICINE AND BIOLOGY, 2011, 56 (09): : 2817 - 2836
[32] GPU-Accelerated 3D Mesh Deformation for Optimization Based on the Finite Element Method
Lamecki, Adam
Dziekonski, Adam
Balewski, Lukasz
Fotyga, Grzegorz
Mrozowski, Michal
RADIOENGINEERING, 2017, 26 (04) : 924 - 929
[33] The Impact of 3D Stacking on GPU-Accelerated Deep Neural Networks: an Experimental Study
Wahby, William
Sarvey, Thomas
Sharma, Hardik
Esmaeilzadeh, Hadi
Bakir, Muhannad S.
2016 IEEE INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC), 2016,
[34] A GPU-Accelerated Method for 3D Nonlinear Kelvin Ship Wake Patterns Simulation
Sun, Xiaofeng
Cai, Miaoyu
Ding, Junchen
APPLIED SCIENCES-BASEL, 2023, 13 (22):
[35] GPU-accelerated 3D volumetric X-ray-induced acoustic computed tomography
Lee, Donghyun
Park, Eun-Yeong
Choi, Seongwook
Kim, Hyeongsub
Min, Jung-joon
Lee, Changho
Kim, Chulhong
BIOMEDICAL OPTICS EXPRESS, 2020, 11 (02) : 752 - 761
[36] 3D Radiative Transfer for Exoplanet Atmospheres. gCMCRT: A GPU-accelerated MCRT Code
Lee, Elspeth K. H.
Wardenier, Joost P.
Prinoth, Bibiana
Parmentier, Vivien
Grimm, Simon L.
Baeyens, Robin
Carone, Ludmila
Christie, Duncan
Deitrick, Russell
Kitzmann, Daniel
Mayne, Nathan
Roman, Michael
Thorsbro, Brian
ASTROPHYSICAL JOURNAL, 2022, 929 (02):
[37] Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
徐琪
余纲林
王侃
孙嘉龙
Nuclear Science and Techniques, 2014, 25 (01) : 61 - 65
[38] GPU-accelerated real-time 3D tracking for humanoid locomotion and stair climbing
Michel, Philipp
Chestnutt, Joel
Kagami, Satoshi
Nishiwaki, Koichi
Kuffner, James
Kanade, Takeo
2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 469 - +
[39] GPU-accelerated Computation of 3D laser radar range imaging of arbitrary coarse targets
Lin, Jiaxuan
Wu, Zhensen
Su, Xiang
Wu, Jiaji
Wang, Biao
Cao, Yunhua
PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 868 - 872
[40] Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
Xu Qi
Yu Gang-Lin
Wang Kan
Sun Jia-Long
NUCLEAR SCIENCE AND TECHNIQUES, 2014, 25 (01)

← 1 2 3 4 5 →