Autotuning CUDA compiler parameters for heterogeneous applications using the OpenTuner framework

被引:8
|
作者
Bruel, Pedro [1 ]
Amaris, Marcos [1 ]
Goldman, Alfredo [1 ]
机构
[1] Univ Sao Paulo, IME, R Matao 1010,Cidade Univ, Sao Paulo, SP, Brazil
来源
基金
巴西圣保罗研究基金会;
关键词
autotuning; GPUs; compilers; CUDA; OpenTuner; PERFORMANCE ANALYSIS; MODEL;
D O I
10.1002/cpe.3973
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The enormous heterogeneity of parallel computing platforms justifies and motivates the development of automated optimization tools and techniques. The Algorithm Selection Problem consists in finding a combination of algorithms, or a configuration of an algorithm, that optimizes the solution of a set of problem instances. An autotuner solves the Algorithm Selection Problem using search and optimization techniques. In this paper, we implement an autotuner for the Compute Unified Device Architecture compiler's parameters using the OpenTuner framework. The autotuner searches for a set of compilation parameters that optimizes the time to solve a problem. We analyze the performance speedups, in comparison with high-level compiler optimizations, achieved in three different GPU devices, for 17 heterogeneous GPU applications, 12 of which are from the Rodinia Benchmark Suite. The autotuner often beats the compiler's high-level optimizations, but underperformed for some problems. We achieved over 2x speedup for Gaussian Elimination and almost 2x speedup for Heart Wall, both problems from the Rodinia Benchmark, and over 4x speedup for a matrix multiplication algorithm. Copyright (c) 2017 John Wiley & Sons, Ltd.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] The VINEYARD framework for heterogeneous cloud applications: The BrainFrame case
    Sidiropoulos, Harry
    Chatzikonstantis, George
    Soudris, Dimitrios
    Strydis, Christos
    2018 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL AND IMAGE PROCESSING (DASIP), 2018, : 70 - 75
  • [42] RDF Containers - A Framework for the Integration of Distributed and Heterogeneous Applications
    Mordinyi, Richard
    Moser, Thomas
    Murth, Martin
    Kuehn, Eva
    Biffl, Stefan
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2010 WORKSHOPS, 2010, 6428 : 90 - +
  • [43] Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores
    Hayashi, Akihiro
    Wada, Yasutaka
    Watanabe, Takeshi
    Sekiguchi, Takeshi
    Mase, Masayoshi
    Shirako, Jun
    Kimura, Keiji
    Kasahara, Hironori
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2011, 6548 : 184 - 198
  • [44] CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs
    DeHao Chen
    WenGuang Chen
    WeiMin Zheng
    Science China Information Sciences, 2012, 55 : 663 - 676
  • [45] Energy and Performance Prediction of CUDA Applications using Dynamic Regression Models
    Benedict, Shajulin
    Rejitha, R. S.
    Alex, Suja A.
    PROCEEDINGS OF THE 9TH INDIA SOFTWARE ENGINEERING CONFERENCE, 2016, : 37 - 47
  • [47] CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs
    Chen DeHao
    Chen WenGuang
    Zheng WeiMin
    SCIENCE CHINA-INFORMATION SCIENCES, 2012, 55 (03) : 663 - 676
  • [48] Flexible neuronal network simulation framework using code generation for NVidia® CUDA™
    Thomas Nowotny
    BMC Neuroscience, 12 (Suppl 1)
  • [49] Parallel Performance Analysis for CUDA-Based Co-rank Framework on Bipartite Graphs Heterogeneous Network
    Zheng, Fang
    Tan, Han
    Tian, Fang
    2018 17TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES), 2018, : 12 - 15
  • [50] Prediction of Chemical Bond Formation using efficient CUDA based HPC Framework
    Kulkarni, Manjiri K.
    Umale, J. S.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,