Autotuning CUDA compiler parameters for heterogeneous applications using the OpenTuner framework

被引：8

作者：

Bruel, Pedro ^{[1
]}

Amaris, Marcos ^{[1
]}

Goldman, Alfredo ^{[1
]}

机构：

[1] Univ Sao Paulo, IME, R Matao 1010,Cidade Univ, Sao Paulo, SP, Brazil

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2017年 / 29卷 / 22期

基金：

巴西圣保罗研究基金会;

关键词：

autotuning; GPUs; compilers; CUDA; OpenTuner; PERFORMANCE ANALYSIS; MODEL;

D O I：

10.1002/cpe.3973

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The enormous heterogeneity of parallel computing platforms justifies and motivates the development of automated optimization tools and techniques. The Algorithm Selection Problem consists in finding a combination of algorithms, or a configuration of an algorithm, that optimizes the solution of a set of problem instances. An autotuner solves the Algorithm Selection Problem using search and optimization techniques. In this paper, we implement an autotuner for the Compute Unified Device Architecture compiler's parameters using the OpenTuner framework. The autotuner searches for a set of compilation parameters that optimizes the time to solve a problem. We analyze the performance speedups, in comparison with high-level compiler optimizations, achieved in three different GPU devices, for 17 heterogeneous GPU applications, 12 of which are from the Rodinia Benchmark Suite. The autotuner often beats the compiler's high-level optimizations, but underperformed for some problems. We achieved over 2x speedup for Gaussian Elimination and almost 2x speedup for Heart Wall, both problems from the Rodinia Benchmark, and over 4x speedup for a matrix multiplication algorithm. Copyright (c) 2017 John Wiley & Sons, Ltd.

引用

页数：15

共 50 条

[1] OpenTuner: An Extensible Framework for Program Autotuning
Ansel, Jason
Kamil, Shoaib
Veeramachaneni, Kalyan
Ragan-Kelley, Jonathan
Bosboom, Jeffrey
O'Reilly, Una-May
Amarasinghe, Saman
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 303 - 315
[2] COBAYN: Compiler Autotuning Framework Using Bayesian Networks
Ashouri, Amir Hossein
Mariani, Giovanni
Palermo, Gianluca
Park, Eunjung
Cavazos, John
Silvano, Cristina
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (02)
[3] Piecewise Holistic Autotuning of Compiler and Runtime Parameters
Popov, Mihail
Akel, Chadi
Jalby, William
Castro, Pablo de Oliveira
EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 238 - 250
[4] Autotuning High-Level Synthesis for FPGAs Using OpenTuner and LegUp
Bruel, Pedro
Goldman, Alfredo
Chalamalasetti, Sai Rahul
Milojicic, Dejan
2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2017,
[5] SOCRATES - A Seamless Online Compiler and System Runtime AutoTuning Framework for Energy-Aware Applications
Gadioli, Davide
Nobre, Ricardo
Pinto, Pedro
Vitali, Emanuele
Ashouri, Amir H.
Palermo, Gianluca
Cardoso, Joao
Silvano, Cristina
PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 1143 - 1146
[6] A Survey on Compiler Autotuning using Machine Learning
Ashouri, Amir H.
Killian, William
Cavazos, John
Palermo, Gianluca
Silvano, Cristina
ACM COMPUTING SURVEYS, 2019, 51 (05)
[7] A Script-Based Autotuning Compiler System to Generate High-Performance CUDA Code
Khan, Malik
Basu, Protonu
Rudy, Gabe
Hall, Mary
Chen, Chun
Chame, Jacqueline
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
[8] OMPCUDA : OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler
Ohshima, Satoshi
Hirasawa, Shoichi
Honda, Hiroki
BEYOND LOOP LEVEL PARALLELISM IN OPENMP: ACCELERATORS, TASKING AND MORE, PROCEEDINGS, 2010, 6132 : 161 - +
[9] Autotuning of MPI Applications Using PTF
Sikora, Anna
Cesar, Eduardo
Compres, Isaias
Gerndt, Michael
PROCEEDINGS OF THE ACM WORKSHOP ON SOFTWARE ENGINEERING METHODS FOR PARALLEL AND HIGH PERFORMANCE APPLICATIONS (SEM4HPC'16), 2016, : 31 - 38
[10] Compiler-Managed Replication of CUDA Kernels for Reliable Execution of GPGPU Applications
Kaya, Ercument
Oz, Isil
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (14)

← 1 2 3 4 5 →