Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

被引:0
|
作者
Li, Chendi [1 ]
Xu, Yufan [1 ]
Saravani, Sina Mahdipour [1 ]
Sadayappan, P. [1 ]
机构
[1] Univ Utah, Salt Lake City, UT 84112 USA
基金
美国国家科学基金会;
关键词
Auto-tuning; Design space exploration; GPU kernel optimization; Neural networks; Performance modeling; Tile-size optimization;
D O I
10.1145/3650200.3656626
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
TVM is a state-of-the-art auto-tuning compiler for the synthesis of high-performance implementations of tensor computations. However, an extensive search in the vast design space via thousands of compile-execute trials is often needed to identify high-performance code versions, leading to high auto-tuning time. This paper develops new performance modeling and design space exploration strategies to accelerate the code optimization process within TVM. Experimental evaluation on a number of matrix-matrix multiplication and 2D convolution kernels demonstrates about an order-of-magnitude improvement in auto-tuning time to achieve the same level of code performance.
引用
收藏
页码:549 / 561
页数:13
相关论文
共 50 条
  • [1] Bayesian Optimization for auto-tuning GPU kernels
    Willemsen, Floris-Jan
    van Nieuwpoort, Rob
    van Werkhoven, Ben
    PROCEEDINGS OF PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2021), 2021, : 106 - 117
  • [2] Benchmarking Optimization Algorithms for Auto-Tuning GPU Kernels
    Schoonhoven, Richard Arnoud
    van Werkhoven, Ben
    Batenburg, Kees Joost
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (03) : 550 - 564
  • [3] A Fine-grained Prefetching Scheme for DGEMM Kernels on GPU with Auto-tuning Compatibility
    Li, Jialin
    Ye, Huang
    Tian, Shaobo
    Li, Xinyuan
    Zhang, Jian
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 863 - 874
  • [4] Optimizing and Auto-tuning Belief Propagation on the GPU
    Grauer-Gray, Scott
    Cavazos, John
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2011, 6548 : 121 - 135
  • [5] Toward Techniques for Auto-tuning GPU Algorithms
    Davidson, Andrew
    Owens, John
    APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 110 - 119
  • [6] Adaptive GPU Array Layout Auto-Tuning
    Weber, Nicolas
    Goesele, Michael
    PROCEEDINGS OF THE ACM WORKSHOP ON SOFTWARE ENGINEERING METHODS FOR PARALLEL AND HIGH PERFORMANCE APPLICATIONS (SEM4HPC'16), 2016, : 21 - 28
  • [7] Testing and Auto-Tuning GPU code with Kernel Tuner
    van Werkhoven, Ben
    2019 18TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2019), 2019, : XXI - XXI
  • [8] Auto-Tuning GEMV on Many-Core GPU
    Xu, Weizhi
    Liu, Zhiyong
    Wu, Jun
    Ye, Xiaochun
    Jiao, Shuai
    Wang, Da
    Song, Fenglong
    Fan, Dongrui
    PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 30 - 36
  • [9] PERI - Auto-tuning memory-Intensive kernels for multicore
    Williams, Samuel
    Datta, Kaushik
    Carter, Jonathan
    Oliker, Leonid
    Shalf, John
    Yelick, Katherine
    Bailey, David
    SCIDAC 2008: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2008, 125
  • [10] GPU Auto-tuning Framework for Optimal Performance and Power Consumption
    Cheema, Sunbal
    Khan, Gul N.
    15TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPU, GPGPU 2023, 2023, : 1 - 6