Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading

被引:0
|
作者
Farui Wang
Weizhe Zhang
Haonan Guo
Meng Hao
Gangzhao Lu
Zheng Wang
机构
[1] Harbin Institute of Technology,School of Computer Science and Technology
[2] University of Leeds,School of Computing
来源
关键词
Heterogeneous computing; Source-to-source translation; OpenMP offloading; Compilation optimization; GPUs;
D O I
暂无
中图分类号
学科分类号
摘要
Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heterogeneous multicores offer the potential for high performance, programmers are struggling to program such systems. This paper presents OAO, a compiler-based approach to automatically translate shared-memory OpenMP data-parallel programs to run on heterogeneous multicores through OpenMP offloading directives. Given the large user base of shared memory OpenMP programs, our approach allows programmers to continue using a single-source-based programming language that they are familiar with while benefiting from the heterogeneous performance. OAO introduces a novel runtime optimization scheme to automatically eliminate unnecessary host–device communication to minimize the communication overhead between the host and the accelerator device. We evaluate OAO by applying it to 23 benchmarks from the PolyBench and Rodinia suites on two distinct GPU platforms. Experimental results show that OAO achieves up to 32×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} speedup over the original OpenMP version, and can reduce the host–device communication overhead by up to 99% over the hand-translated version.
引用
收藏
页码:4957 / 4987
页数:30
相关论文
共 36 条
  • [21] Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text
    Antonio Jimeno Yepes
    Élise Prieur-Gaston
    Aurélie Névéol
    BMC Bioinformatics, 14
  • [22] Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text
    Yepes, Antonio Jimeno
    Prieur-Gaston, Elise
    Neveol, Aurelie
    BMC BIOINFORMATICS, 2013, 14
  • [23] Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text
    Jimeno Yepes, Antonio
    Prieur-Gaston, Élise
    Névéol, Aurélie
    BMC Bioinformatics, 2013, 14
  • [24] Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications
    Yamato, Yoji
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 54 (03) : 567 - 584
  • [25] Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications
    Yoji Yamato
    Journal of Intelligent Information Systems, 2020, 54 : 567 - 584
  • [26] Partitioning Data-parallel Programs for Heterogeneous MPSoCs : Time and Energy Design Space Exploration
    Chandramohan, Kiran
    O'Boyle, Michael F. P.
    ACM SIGPLAN NOTICES, 2014, 49 (05) : 73 - 82
  • [27] Performance analysis of data compression algorithms for heterogeneous architecture through parallel approach
    Mahammad, Farooq Sunar
    Viswanatham, V. Madhu
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (04): : 2275 - 2288
  • [28] Performance analysis of data compression algorithms for heterogeneous architecture through parallel approach
    Farooq Sunar Mahammad
    V. Madhu Viswanatham
    The Journal of Supercomputing, 2020, 76 : 2275 - 2288
  • [29] Provably Good Scheduling for Parallel Programs that Use Data Structures through Implicit Batching
    Agrawal, Kunal
    Fineman, Jeremy T.
    Sheridan, Brendan
    Sukha, Jim
    Utterback, Robert
    ACM SIGPLAN NOTICES, 2014, 49 (08) : 389 - 390
  • [30] Provably Good Scheduling for Parallel Programs that Use Data Structures through Implicit Batching
    Agrawal, Kunal
    Fineman, Jeremy T.
    Lu, Kefu
    Sheridan, Brendan
    Sukha, Jim
    Utterback, Robert
    PROCEEDINGS OF THE 26TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA'14), 2014, : 84 - 95