Autotuning CUDA compiler parameters for heterogeneous applications using the OpenTuner framework

被引:8
|
作者
Bruel, Pedro [1 ]
Amaris, Marcos [1 ]
Goldman, Alfredo [1 ]
机构
[1] Univ Sao Paulo, IME, R Matao 1010,Cidade Univ, Sao Paulo, SP, Brazil
来源
基金
巴西圣保罗研究基金会;
关键词
autotuning; GPUs; compilers; CUDA; OpenTuner; PERFORMANCE ANALYSIS; MODEL;
D O I
10.1002/cpe.3973
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The enormous heterogeneity of parallel computing platforms justifies and motivates the development of automated optimization tools and techniques. The Algorithm Selection Problem consists in finding a combination of algorithms, or a configuration of an algorithm, that optimizes the solution of a set of problem instances. An autotuner solves the Algorithm Selection Problem using search and optimization techniques. In this paper, we implement an autotuner for the Compute Unified Device Architecture compiler's parameters using the OpenTuner framework. The autotuner searches for a set of compilation parameters that optimizes the time to solve a problem. We analyze the performance speedups, in comparison with high-level compiler optimizations, achieved in three different GPU devices, for 17 heterogeneous GPU applications, 12 of which are from the Rodinia Benchmark Suite. The autotuner often beats the compiler's high-level optimizations, but underperformed for some problems. We achieved over 2x speedup for Gaussian Elimination and almost 2x speedup for Heart Wall, both problems from the Rodinia Benchmark, and over 4x speedup for a matrix multiplication algorithm. Copyright (c) 2017 John Wiley & Sons, Ltd.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] OpenTuner: An Extensible Framework for Program Autotuning
    Ansel, Jason
    Kamil, Shoaib
    Veeramachaneni, Kalyan
    Ragan-Kelley, Jonathan
    Bosboom, Jeffrey
    O'Reilly, Una-May
    Amarasinghe, Saman
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 303 - 315
  • [2] COBAYN: Compiler Autotuning Framework Using Bayesian Networks
    Ashouri, Amir Hossein
    Mariani, Giovanni
    Palermo, Gianluca
    Park, Eunjung
    Cavazos, John
    Silvano, Cristina
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (02)
  • [3] Piecewise Holistic Autotuning of Compiler and Runtime Parameters
    Popov, Mihail
    Akel, Chadi
    Jalby, William
    Castro, Pablo de Oliveira
    EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 238 - 250
  • [4] Autotuning High-Level Synthesis for FPGAs Using OpenTuner and LegUp
    Bruel, Pedro
    Goldman, Alfredo
    Chalamalasetti, Sai Rahul
    Milojicic, Dejan
    2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2017,
  • [5] SOCRATES - A Seamless Online Compiler and System Runtime AutoTuning Framework for Energy-Aware Applications
    Gadioli, Davide
    Nobre, Ricardo
    Pinto, Pedro
    Vitali, Emanuele
    Ashouri, Amir H.
    Palermo, Gianluca
    Cardoso, Joao
    Silvano, Cristina
    PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 1143 - 1146
  • [6] A Survey on Compiler Autotuning using Machine Learning
    Ashouri, Amir H.
    Killian, William
    Cavazos, John
    Palermo, Gianluca
    Silvano, Cristina
    ACM COMPUTING SURVEYS, 2019, 51 (05)
  • [7] A Script-Based Autotuning Compiler System to Generate High-Performance CUDA Code
    Khan, Malik
    Basu, Protonu
    Rudy, Gabe
    Hall, Mary
    Chen, Chun
    Chame, Jacqueline
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
  • [8] OMPCUDA : OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler
    Ohshima, Satoshi
    Hirasawa, Shoichi
    Honda, Hiroki
    BEYOND LOOP LEVEL PARALLELISM IN OPENMP: ACCELERATORS, TASKING AND MORE, PROCEEDINGS, 2010, 6132 : 161 - +
  • [9] Autotuning of MPI Applications Using PTF
    Sikora, Anna
    Cesar, Eduardo
    Compres, Isaias
    Gerndt, Michael
    PROCEEDINGS OF THE ACM WORKSHOP ON SOFTWARE ENGINEERING METHODS FOR PARALLEL AND HIGH PERFORMANCE APPLICATIONS (SEM4HPC'16), 2016, : 31 - 38
  • [10] Compiler-Managed Replication of CUDA Kernels for Reliable Execution of GPGPU Applications
    Kaya, Ercument
    Oz, Isil
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (14)