Piecewise Holistic Autotuning of Compiler and Runtime Parameters

被引:4
|
作者
Popov, Mihail [1 ]
Akel, Chadi [2 ]
Jalby, William [1 ]
Castro, Pablo de Oliveira [1 ]
机构
[1] Univ Versailles St Quentin En Yvelines, Univ Paris Saclay, Versailles, France
[2] Exascale Comp Res, Versailles, France
来源
关键词
CODE;
D O I
10.1007/978-3-319-43659-3_18
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve full potential performance. Auto-tuning substantially improves default parameters in many scenarios but it is a costly process requiring a long iterative evaluation. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces called codelets: each codelet maps to a loop or to an OpenMP parallel region and can be replayed as a standalone program. Codelet autotuning achieves better speedups at a lower tuning cost. By grouping codelet invocations with the same performance behavior, CERE reduces the number of loops or OpenMP regions to be evaluated. Moreover unlike whole-program tuning, CERE customizes the set of best parameters for each specific OpenMP region or loop. We demonstrate CERE tuning of compiler optimizations, number of threads and thread affinity on a NUMA architecture. On average over the NAS 3.0 benchmarks, we achieve a speedup of 1.08x after tuning. Tuning a single codelet is 13x cheaper than whole-program evaluation and estimates the tuning impact on the original region with a 94.7% accuracy. On a Reverse Time Migration (RTM) proto-application we achieve a 1.11x speedup with a 200x cheaper exploration.
引用
收藏
页码:238 / 250
页数:13
相关论文
共 50 条
  • [11] Dandelion: a Compiler and Runtime for Heterogeneous Systems
    Rossbach, Christopher J.
    Yu, Yuan
    Currey, Jon
    Martin, Jean-Philippe
    Fetterly, Dennis
    SOSP'13: PROCEEDINGS OF THE TWENTY-FOURTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, 2013, : 49 - 68
  • [12] Compiler and Runtime Support for Continuation Marks
    Flatt, Matthew
    Dybvig, R. Kent
    PROCEEDINGS OF THE 41ST ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '20), 2020, : 45 - 58
  • [13] Automating Compiler-Directed Autotuning for Phased Performance Behavior
    Rusira, Tharindu
    Hall, Mary
    Basu, Protonu
    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 1362 - 1371
  • [14] Optimization Space Learning: A Lightweight, Noniterative Technique for Compiler Autotuning
    Burgstaller, Tamim
    Garber, Damian
    Le, Viet-Man
    Felfernig, Alexander
    28TH INTERNATIONAL SYSTEMS AND SOFTWARE PRODUCT LINE CONFERENCE, SPLC 2024, 2024, : 36 - +
  • [15] Experiences Developing the OpenUH Compiler and Runtime Infrastructure
    Chapman, Barbara
    Eachempati, Deepak
    Hernandez, Oscar
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (06) : 825 - 854
  • [16] A compiler and runtime infrastructure for automatic program distribution
    Diaconescu, Roxana E.
    Wang, Lei
    Mouri, Zachary
    Chu, Matt
    Proc. 19th IEEE Int. Parallel Distr. Proces. Symp.,
  • [17] Experiences Developing the OpenUH Compiler and Runtime Infrastructure
    Barbara Chapman
    Deepak Eachempati
    Oscar Hernandez
    International Journal of Parallel Programming, 2013, 41 : 825 - 854
  • [18] Improving Compiler-Runtime Separation with XIR
    Titzer, Ben L.
    Wuerthinger, Thomas
    Simon, Doug
    Cintra, Marcelo
    ACM SIGPLAN NOTICES, 2010, 45 (07) : 39 - 49
  • [19] Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication
    Yilmaz, Buse
    Aktemur, Baris
    Garzaran, Maria J.
    Kamin, Sam
    Kirac, Furkan
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (01)
  • [20] Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack
    Diamantopoulos, Dionysios
    Ringlein, Burkhard
    Purandare, Mitra
    Singh, Gagandeep
    Hagleitner, Christoph
    2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2020, : 310 - 316