Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

被引:0
|
作者
Li, Chendi [1 ]
Xu, Yufan [1 ]
Saravani, Sina Mahdipour [1 ]
Sadayappan, P. [1 ]
机构
[1] Univ Utah, Salt Lake City, UT 84112 USA
基金
美国国家科学基金会;
关键词
Auto-tuning; Design space exploration; GPU kernel optimization; Neural networks; Performance modeling; Tile-size optimization;
D O I
10.1145/3650200.3656626
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
TVM is a state-of-the-art auto-tuning compiler for the synthesis of high-performance implementations of tensor computations. However, an extensive search in the vast design space via thousands of compile-execute trials is often needed to identify high-performance code versions, leading to high auto-tuning time. This paper develops new performance modeling and design space exploration strategies to accelerate the code optimization process within TVM. Experimental evaluation on a number of matrix-matrix multiplication and 2D convolution kernels demonstrates about an order-of-magnitude improvement in auto-tuning time to achieve the same level of code performance.
引用
收藏
页码:549 / 561
页数:13
相关论文
共 50 条
  • [31] Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors
    Lim, Roktaek
    Lee, Yeongha
    Kim, Raehyun
    Choi, Jaeyoung
    Lee, Myungho
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (12): : 7895 - 7908
  • [32] Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors
    Roktaek Lim
    Yeongha Lee
    Raehyun Kim
    Jaeyoung Choi
    Myungho Lee
    The Journal of Supercomputing, 2019, 75 : 7895 - 7908
  • [33] Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation
    Steuwer, Michel
    Remmelg, Toomas
    Dubach, Christophe
    2016 INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURE AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES), 2016,
  • [34] Vibration control of milling machine by using auto-tuning magnetic damper and auto-tuning vibration absorber
    Nagaya, K
    Kobayasi, J
    Imai, K
    INTERNATIONAL JOURNAL OF APPLIED ELECTROMAGNETICS AND MECHANICS, 2002, 16 (1-2) : 111 - 123
  • [35] Pushing the Limits of Online Auto-tuning: Machine Code Optimization in Short-Running Kernels
    Endo, Fernando
    Courousse, Damien
    Charles, Henri-Pierre
    2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC), 2016, : 265 - 272
  • [36] Auto-tuning of cascade control systems
    Song, SH
    Xie, LH
    Cai, WJ
    PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-4, 2002, : 3339 - 3343
  • [37] ATF: A Generic Auto-Tuning Framework
    Rasch, Ari
    Haidl, Michael
    Gorlatch, Sergei
    2017 19TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS (HPCC) / 2017 15TH IEEE INTERNATIONAL CONFERENCE ON SMART CITY (SMARTCITY) / 2017 3RD IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (DSS), 2017, : 64 - 71
  • [38] Auto-tuning interactive multiple model
    Ng, GW
    Lau, A
    How, KY
    ACQUISITION, TRACKING, AND POINTING XII, 1998, 3365 : 131 - 138
  • [39] Survey on PID auto-tuning modules
    Ang, KH
    Yun, L
    PROCEEDINGS OF THE 5TH ASIA-PACIFIC CONFERENCE ON CONTROL & MEASUREMENT, 2002, : 148 - 153
  • [40] ATF: A Generic Auto-Tuning Framework
    Rasch, Ari
    Gorlatch, Sergei
    HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING: POSTERS/DOCTORAL CONSORTIUM, 2018, : 3 - 4