Autotuning CUDA compiler parameters for heterogeneous applications using the OpenTuner framework

被引:8
|
作者
Bruel, Pedro [1 ]
Amaris, Marcos [1 ]
Goldman, Alfredo [1 ]
机构
[1] Univ Sao Paulo, IME, R Matao 1010,Cidade Univ, Sao Paulo, SP, Brazil
来源
基金
巴西圣保罗研究基金会;
关键词
autotuning; GPUs; compilers; CUDA; OpenTuner; PERFORMANCE ANALYSIS; MODEL;
D O I
10.1002/cpe.3973
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The enormous heterogeneity of parallel computing platforms justifies and motivates the development of automated optimization tools and techniques. The Algorithm Selection Problem consists in finding a combination of algorithms, or a configuration of an algorithm, that optimizes the solution of a set of problem instances. An autotuner solves the Algorithm Selection Problem using search and optimization techniques. In this paper, we implement an autotuner for the Compute Unified Device Architecture compiler's parameters using the OpenTuner framework. The autotuner searches for a set of compilation parameters that optimizes the time to solve a problem. We analyze the performance speedups, in comparison with high-level compiler optimizations, achieved in three different GPU devices, for 17 heterogeneous GPU applications, 12 of which are from the Rodinia Benchmark Suite. The autotuner often beats the compiler's high-level optimizations, but underperformed for some problems. We achieved over 2x speedup for Gaussian Elimination and almost 2x speedup for Heart Wall, both problems from the Rodinia Benchmark, and over 4x speedup for a matrix multiplication algorithm. Copyright (c) 2017 John Wiley & Sons, Ltd.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs
    Lovergine, Silvia
    Tumeo, Antonino
    Villa, Oreste
    Ferrandi, Fabrizio
    RAPID SYSTEM PROTOTYPING: SHORTENING THE PATH FROM SPECIFICATION TO PROTOTYPE (RSP 2013), 2013, : 123 - 129
  • [22] Evaluation of NDVI and NDWI parameters in CPU-GPU Heterogeneous Platforms based CUDA
    Guerrouj, Fatima Zahra
    Latif, Rachid
    Saddik, Amine
    PROCEEDINGS OF 2020 5TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS (CLOUDTECH'20), 2020, : 74 - 79
  • [23] Performance Measurement of Applications with GPU Acceleration using CUDA
    Mayanglambam, Shangkar
    Malony, Allen D.
    Sottile, Matthew J.
    PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 341 - 348
  • [24] Loading OpenMP to Cell: An effective compiler framework for heterogeneous multi-core chip
    Wei, Haitao
    Yu, Junqing
    PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 129 - 133
  • [25] COALA: A Compiler-Assisted Adaptive Library Routines Allocation Framework for Heterogeneous Systems
    Cai, Qinyun
    Tan, Guanghua
    Yang, Wangdong
    He, Xianhao
    Yan, Yuwei
    Li, Keqin
    Li, Kenli
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (07) : 1724 - 1737
  • [26] Supercomputer Modelling of Spatially-heterogeneous Coagulation using MPI and CUDA
    Zagidullin, Rishat
    Smirnov, Alexander
    Matveev, Sergey
    Tyrtyshnikov, Eugene
    SUPERCOMPUTING (RUSCDAYS 2019), 2019, 1129 : 403 - 414
  • [27] Design of a Parallel AES for Graphics Hardware using the CUDA framework
    Di Biagio, Andrea
    Barenghi, Alessandro
    Agosta, Giovanni
    Pelosi, Gerardo
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 3139 - +
  • [28] Fast Autotuning Configurations of Parameters in Distributed Computing Systems Using Ordinal Optimization
    Zhang, Fan
    Cao, Junwei
    Liu, Lianchen
    Wu, Cheng
    2009 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2009), 2009, : 190 - +
  • [29] An Adaptive Heterogeneous Runtime Framework for Irregular Applications
    Chih-Chen Kao
    Wei-Chung Hsu
    Journal of Signal Processing Systems, 2015, 80 : 245 - 259
  • [30] An Adaptive Heterogeneous Runtime Framework for Irregular Applications
    Kao, Chih-Chen
    Hsu, Wei-Chung
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2015, 80 (03): : 245 - 259