Piecewise Holistic Autotuning of Compiler and Runtime Parameters

被引:4
|
作者
Popov, Mihail [1 ]
Akel, Chadi [2 ]
Jalby, William [1 ]
Castro, Pablo de Oliveira [1 ]
机构
[1] Univ Versailles St Quentin En Yvelines, Univ Paris Saclay, Versailles, France
[2] Exascale Comp Res, Versailles, France
来源
关键词
CODE;
D O I
10.1007/978-3-319-43659-3_18
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve full potential performance. Auto-tuning substantially improves default parameters in many scenarios but it is a costly process requiring a long iterative evaluation. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces called codelets: each codelet maps to a loop or to an OpenMP parallel region and can be replayed as a standalone program. Codelet autotuning achieves better speedups at a lower tuning cost. By grouping codelet invocations with the same performance behavior, CERE reduces the number of loops or OpenMP regions to be evaluated. Moreover unlike whole-program tuning, CERE customizes the set of best parameters for each specific OpenMP region or loop. We demonstrate CERE tuning of compiler optimizations, number of threads and thread affinity on a NUMA architecture. On average over the NAS 3.0 benchmarks, we achieve a speedup of 1.08x after tuning. Tuning a single codelet is 13x cheaper than whole-program evaluation and estimates the tuning impact on the original region with a 94.7% accuracy. On a Reverse Time Migration (RTM) proto-application we achieve a 1.11x speedup with a 200x cheaper exploration.
引用
收藏
页码:238 / 250
页数:13
相关论文
共 50 条
  • [31] A prolog constraint handling rules compiler and runtime system
    Holzbaur, C
    Frühwirth, T
    APPLIED ARTIFICIAL INTELLIGENCE, 2000, 14 (04) : 369 - 388
  • [32] Compiler Transformations to Enable Synchronous Execution in an RIA Runtime
    Iyer, Anantharaman P. Narayana
    Chatterjee, Arijit
    Kishnani, Jyoti
    IEEE INTERNET COMPUTING, 2010, 14 (03) : 13 - 23
  • [33] COMPILER ASSISTED RUNTIME TASK SCHEDULING ON A RECONFIGURABLE COMPUTER
    Sabeghi, Mojtaba
    Sima, Vlad-Mihai
    Bertels, Koen
    FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 44 - 50
  • [34] An Approach Based on a DSL plus API for Programming Runtime Adaptivity and Autotuning Concerns
    Carvalho, Tiago
    Cardoso, Joao M. P.
    33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 1211 - 1220
  • [35] Compiler and runtime support for enabling reduction computations on heterogeneous systems
    Ravi, Vignesh T.
    Ma, Wenjing
    Chiu, David
    Agrawal, Gagan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (05): : 463 - 480
  • [36] The Pandore data-parallel compiler and its portable runtime
    Andre, F
    LeFur, M
    Maheo, Y
    Pazat, JL
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1995, 919 : 176 - 183
  • [37] Compiler and runtime analysis for efficient communication in data intensive applications
    Ferreira, R
    Agrawal, G
    Saltz, J
    2001 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2001, : 231 - 242
  • [38] Compiler and runtime support for adaptive sparse computations on a multithreaded architecture
    Zoppetti, GM
    Agrawal, G
    PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, 2002, : 488 - 493
  • [39] Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs
    Kong, Martin
    Pop, Antoniu
    Pouchet, Louis-Noel
    Govindarajan, R.
    Cohen, Albert
    Sadayappan, P.
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2014, 11 (04)
  • [40] DEEPTOOLS: Compiler and Execution Runtime Extensions for RAPiD AI Accelerator
    Venkataramani, Swagath
    Choi, Jungwook
    Srinivasan, Vijayalakshmi
    Wang, Wei
    Zhang, Jintao
    Schaal, Marcel
    Serrano, Mauricio J.
    Ishizaki, Kazuaki
    Inoue, Hiroshi
    Ogawa, Eri
    Ohara, Motiyoshi
    Chang, Leland
    Gopalakrishnan, Kailash
    IEEE MICRO, 2019, 39 (05) : 102 - 111