Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

被引：0

作者：

Li, Chendi ^{[1
]}

Xu, Yufan ^{[1
]}

Saravani, Sina Mahdipour ^{[1
]}

Sadayappan, P. ^{[1
]}

机构：

[1] Univ Utah, Salt Lake City, UT 84112 USA

来源：

PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

Auto-tuning; Design space exploration; GPU kernel optimization; Neural networks; Performance modeling; Tile-size optimization;

D O I：

10.1145/3650200.3656626

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

TVM is a state-of-the-art auto-tuning compiler for the synthesis of high-performance implementations of tensor computations. However, an extensive search in the vast design space via thousands of compile-execute trials is often needed to identify high-performance code versions, leading to high auto-tuning time. This paper develops new performance modeling and design space exploration strategies to accelerate the code optimization process within TVM. Experimental evaluation on a number of matrix-matrix multiplication and 2D convolution kernels demonstrates about an order-of-magnitude improvement in auto-tuning time to achieve the same level of code performance.

引用

页码：549 / 561

页数：13

共 50 条

[1] Bayesian Optimization for auto-tuning GPU kernels
Willemsen, Floris-Jan
van Nieuwpoort, Rob
van Werkhoven, Ben
PROCEEDINGS OF PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2021), 2021, : 106 - 117
[2] Benchmarking Optimization Algorithms for Auto-Tuning GPU Kernels
Schoonhoven, Richard Arnoud
van Werkhoven, Ben
Batenburg, Kees Joost
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (03) : 550 - 564
[3] A Fine-grained Prefetching Scheme for DGEMM Kernels on GPU with Auto-tuning Compatibility
Li, Jialin
Ye, Huang
Tian, Shaobo
Li, Xinyuan
Zhang, Jian
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 863 - 874
[4] Optimizing and Auto-tuning Belief Propagation on the GPU
Grauer-Gray, Scott
Cavazos, John
LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2011, 6548 : 121 - 135
[5] Toward Techniques for Auto-tuning GPU Algorithms
Davidson, Andrew
Owens, John
APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 110 - 119
[6] Adaptive GPU Array Layout Auto-Tuning
Weber, Nicolas
Goesele, Michael
PROCEEDINGS OF THE ACM WORKSHOP ON SOFTWARE ENGINEERING METHODS FOR PARALLEL AND HIGH PERFORMANCE APPLICATIONS (SEM4HPC'16), 2016, : 21 - 28
[7] Testing and Auto-Tuning GPU code with Kernel Tuner
van Werkhoven, Ben
2019 18TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2019), 2019, : XXI - XXI
[8] Auto-Tuning GEMV on Many-Core GPU
Xu, Weizhi
Liu, Zhiyong
Wu, Jun
Ye, Xiaochun
Jiao, Shuai
Wang, Da
Song, Fenglong
Fan, Dongrui
PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 30 - 36
[9] PERI - Auto-tuning memory-Intensive kernels for multicore
Williams, Samuel
Datta, Kaushik
Carter, Jonathan
Oliker, Leonid
Shalf, John
Yelick, Katherine
Bailey, David
SCIDAC 2008: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2008, 125
[10] GPU Auto-tuning Framework for Optimal Performance and Power Consumption
Cheema, Sunbal
Khan, Gul N.
15TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPU, GPGPU 2023, 2023, : 1 - 6

← 1 2 3 4 5 →