Autotuning CUDA compiler parameters for heterogeneous applications using the OpenTuner framework

被引:8
|
作者
Bruel, Pedro [1 ]
Amaris, Marcos [1 ]
Goldman, Alfredo [1 ]
机构
[1] Univ Sao Paulo, IME, R Matao 1010,Cidade Univ, Sao Paulo, SP, Brazil
来源
基金
巴西圣保罗研究基金会;
关键词
autotuning; GPUs; compilers; CUDA; OpenTuner; PERFORMANCE ANALYSIS; MODEL;
D O I
10.1002/cpe.3973
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The enormous heterogeneity of parallel computing platforms justifies and motivates the development of automated optimization tools and techniques. The Algorithm Selection Problem consists in finding a combination of algorithms, or a configuration of an algorithm, that optimizes the solution of a set of problem instances. An autotuner solves the Algorithm Selection Problem using search and optimization techniques. In this paper, we implement an autotuner for the Compute Unified Device Architecture compiler's parameters using the OpenTuner framework. The autotuner searches for a set of compilation parameters that optimizes the time to solve a problem. We analyze the performance speedups, in comparison with high-level compiler optimizations, achieved in three different GPU devices, for 17 heterogeneous GPU applications, 12 of which are from the Rodinia Benchmark Suite. The autotuner often beats the compiler's high-level optimizations, but underperformed for some problems. We achieved over 2x speedup for Gaussian Elimination and almost 2x speedup for Heart Wall, both problems from the Rodinia Benchmark, and over 4x speedup for a matrix multiplication algorithm. Copyright (c) 2017 John Wiley & Sons, Ltd.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Low Complexity Corner Detector Using CUDA for Multimedia Applications
    Phull, Rajat
    Mainali, Pradip
    Yang, Qiong
    Alface, Patrice Rondao
    Sips, Henk
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCES ON ADVANCES IN MULTIMEDIA (MMEDIA 2011), 2011, : 7 - 11
  • [32] GPU Parallelization of a Hybrid Pseudospectral Geophysical Turbulence Framework Using CUDA
    Rosenberg, Duane
    Mininni, Pablo D.
    Reddy, Raghu
    Pouquet, Annick
    ATMOSPHERE, 2020, 11 (02)
  • [33] A Meta-Model Assisted Coprocessor Synthesis Framework for Compiler/Architecture Parameters Customization
    Xydis, Sotirios
    Palermo, Gianluca
    Zaccaria, Vittorio
    Silvano, Cristina
    DESIGN, AUTOMATION & TEST IN EUROPE, 2013, : 659 - 664
  • [34] Obtaining controller parameters for a new PI-PD Smith predictor using autotuning
    Kaya, I
    JOURNAL OF PROCESS CONTROL, 2003, 13 (05) : 465 - 472
  • [35] Optimized code generation for heterogeneous computing environment using parallelizing compiler TINPAR
    Goto, SY
    Kubota, A
    Tanaka, T
    Goshima, M
    Mori, S
    Nakashima, H
    Tomita, S
    1998 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 1998, : 426 - 433
  • [36] NEST-C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators
    Park, Jeman
    Yu, Misun
    Kwon, Jinse
    Park, Junmo
    Lee, Jemin
    Kwon, Yongin
    ETRI JOURNAL, 2024, 46 (05) : 851 - 864
  • [37] An expanded SEMATECH CIM framework for heterogeneous applications integration
    Lin, CP
    Jeng, MD
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2006, 36 (01): : 76 - 90
  • [38] A THEORETICAL FRAMEWORK FOR THE ANALYSIS OF THERMOELECTROELASTIC HETEROGENEOUS MEDIA WITH APPLICATIONS
    DUNN, ML
    JOURNAL OF INTELLIGENT MATERIAL SYSTEMS AND STRUCTURES, 1995, 6 (02) : 255 - 265
  • [39] A framework designed for synchronous groupware applications in heterogeneous environments
    Guicking, Axel
    Grasse, Thomas
    GROUPWARE: DESIGN, IMPLEMENTATION, AND USE, 2006, 4154 : 203 - 218
  • [40] Calvin Constrained - A Framework for IoT Applications in Heterogeneous Environments
    Mehta, Amardeep
    Baddour, Rami
    Svensson, Fredrik
    Gustafsson, Harald
    Elmroth, Erik
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 1063 - 1073