Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading

被引:0
|
作者
Farui Wang
Weizhe Zhang
Haonan Guo
Meng Hao
Gangzhao Lu
Zheng Wang
机构
[1] Harbin Institute of Technology,School of Computer Science and Technology
[2] University of Leeds,School of Computing
来源
The Journal of Supercomputing | 2021年 / 77卷
关键词
Heterogeneous computing; Source-to-source translation; OpenMP offloading; Compilation optimization; GPUs;
D O I
暂无
中图分类号
学科分类号
摘要
Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heterogeneous multicores offer the potential for high performance, programmers are struggling to program such systems. This paper presents OAO, a compiler-based approach to automatically translate shared-memory OpenMP data-parallel programs to run on heterogeneous multicores through OpenMP offloading directives. Given the large user base of shared memory OpenMP programs, our approach allows programmers to continue using a single-source-based programming language that they are familiar with while benefiting from the heterogeneous performance. OAO introduces a novel runtime optimization scheme to automatically eliminate unnecessary host–device communication to minimize the communication overhead between the host and the accelerator device. We evaluate OAO by applying it to 23 benchmarks from the PolyBench and Rodinia suites on two distinct GPU platforms. Experimental results show that OAO achieves up to 32×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} speedup over the original OpenMP version, and can reduce the host–device communication overhead by up to 99% over the hand-translated version.
引用
收藏
页码:4957 / 4987
页数:30
相关论文
共 36 条
  • [31] Capturing variation of discourse relations in English parallel data through automatic annotation and alignment
    Pollklaesener, Christina
    Yung, Frances
    Lapshinova-Koltunski, Ekaterina
    ACROSS LANGUAGES AND CULTURES, 2024, 25 (02) : 288 - 309
  • [32] Runtime Data Management on Non-Volatile Memory-based Heterogeneous Memory for Task-Parallel Programs
    Wu, Kai
    Ren, Jie
    Li, Dong
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018,
  • [33] Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Dynamic Energy Through Workload Distribution
    Khaleghzadeh, Hamidreza
    Fahad, Muhammad
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    EURO-PAR 2019: PARALLEL PROCESSING WORKSHOPS, 2020, 11997 : 320 - 332
  • [34] IR plus : Removing parallel I/O interference of MPI programs via data replication over heterogeneous storage devices
    Zhang, Xuechen
    Jiang, Song
    Diallo, Alseny
    Wang, Lei
    PARALLEL COMPUTING, 2018, 76 : 91 - 105
  • [35] Bi-Objective Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Performance and Energy Through Workload Distribution
    Khaleghzadeh, Hamidreza
    Fahad, Muhammad
    Shahid, Arsalan
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (03) : 543 - 560
  • [36] SENTINEL-1 DATA EXPLOITATION FOR AUTOMATIC SURFACE DEFORMATION TIME-SERIES GENERATION THROUGH THE SBAS-DINSAR PARALLEL PROCESSING CHAIN
    Zinno, Ivana
    Bonano, Manuela
    Buonanno, Sabatino
    Casu, Francesco
    De Luca, Claudio
    Fusco, Adele
    Riccardo, Lanari
    Manunta, Michele
    Manzo, Mariarosaria
    Pepe, Antonio
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 5529 - 5532