Piecewise Holistic Autotuning of Compiler and Runtime Parameters

被引:4
|
作者
Popov, Mihail [1 ]
Akel, Chadi [2 ]
Jalby, William [1 ]
Castro, Pablo de Oliveira [1 ]
机构
[1] Univ Versailles St Quentin En Yvelines, Univ Paris Saclay, Versailles, France
[2] Exascale Comp Res, Versailles, France
来源
关键词
CODE;
D O I
10.1007/978-3-319-43659-3_18
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve full potential performance. Auto-tuning substantially improves default parameters in many scenarios but it is a costly process requiring a long iterative evaluation. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces called codelets: each codelet maps to a loop or to an OpenMP parallel region and can be replayed as a standalone program. Codelet autotuning achieves better speedups at a lower tuning cost. By grouping codelet invocations with the same performance behavior, CERE reduces the number of loops or OpenMP regions to be evaluated. Moreover unlike whole-program tuning, CERE customizes the set of best parameters for each specific OpenMP region or loop. We demonstrate CERE tuning of compiler optimizations, number of threads and thread affinity on a NUMA architecture. On average over the NAS 3.0 benchmarks, we achieve a speedup of 1.08x after tuning. Tuning a single codelet is 13x cheaper than whole-program evaluation and estimates the tuning impact on the original region with a 94.7% accuracy. On a Reverse Time Migration (RTM) proto-application we achieve a 1.11x speedup with a 200x cheaper exploration.
引用
收藏
页码:238 / 250
页数:13
相关论文
共 50 条
  • [41] A Compiler and Runtime System for Enabling Data Mining Applications on GPUs
    Ma, Wenjing
    Agrawal, Gagan
    ACM SIGPLAN NOTICES, 2009, 44 (04) : 287 - 288
  • [42] MetaVM: A transparent distributed object system supported by runtime compiler
    Shudo, K
    Muraoka, Y
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 879 - 882
  • [43] Leveraging Compiler Optimizations to Reduce Runtime Fault Recovery Overhead
    Hosseini, Fateme S.
    Fotouhi, Pouya
    Yang, Chengmo
    Gao, Guang R.
    PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [44] Polyhedral Compiler Technology in Collaboration with Autotuning Important to Domain-Specific Frameworks for HPC
    Hall, Mary
    Basu, Protonu
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2016, 2017, 10136 : 101 - 105
  • [45] RCHC: a Holistic Runtime System for Concurrent Heterogeneous Computing
    Park, Jinsu
    Baek, Woongki
    PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, : 211 - 216
  • [46] Autotuning FPGA design parameters for performance and power
    Mametjanov, Azamat
    Balaprakash, Prasanna
    Choudary, Chekuri
    Hovland, Paul D.
    Wild, Stefan M.
    Sabin, Gerald
    2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 84 - 91
  • [47] Optimization of Compiler-Generated OpenCL CNN Kernels and Runtime for FPGAs
    Chung, Seung-Hun
    Abdelrahman, Tarek S.
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 100 - 103
  • [48] Android App Energy Eifficiency: The Impact of Language, Runtime, Compiler and Implementation
    Chen, Xinbo
    Zong, Ziliang
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 485 - 492
  • [49] Evaluation of compiler and runtime library approaches for supporting parallel regular applications
    Chakrabarti, DR
    Banerjee, P
    Lain, A
    FIRST MERGED INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, 1998, : 74 - 79