Optimizing parallel GEMM routines using auto-tuning with Intel AVX-512

被引:16
|
作者
Kim, Raehyun [1 ]
Choi, Jaeyoung [1 ]
Lee, Myungho [2 ]
机构
[1] Soongsil Univ, Seoul, South Korea
[2] Myongji Univ, Yongin, Gyeonggi, South Korea
关键词
Manycore; Intel Xeon; Intel Xeon Phi; Autotuning; matrix-matrix multiplication; AVX-512;
D O I
10.1145/3293320.3293334
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents the optimal implementations of single-and double-precision general matrix-matrix multiplication (GEMM) routines for the Intel Xeon Phi Processor code-named Knights Landing (KNL) and the Intel Xeon Scalable Processors based on an auto-tuning approach with the Intel AVX-512 intrinsic functions. Our auto-tuning approach precisely determines the parameters reflecting the target architectural features. Our approach significantly reduces the search space and derives optimal parameter sets including the size of submatrices, prefetch distances, loop unrolling depth, and parallelization scheme. Without a single line of assembly code, our GEMM kernels show the comparable performance results to the Intel MKL and outperform other open-source BLAS libraries.
引用
收藏
页码:101 / 110
页数:10
相关论文
共 50 条
  • [41] Fast Multiple Montgomery Multiplications Using Intel AVX-512IFMA Instructions
    Takahashi, Daisuke
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT V, 2020, 12253 : 655 - 663
  • [42] Vibration control of milling machine by using auto-tuning magnetic damper and auto-tuning vibration absorber
    Nagaya, K
    Kobayasi, J
    Imai, K
    INTERNATIONAL JOURNAL OF APPLIED ELECTROMAGNETICS AND MECHANICS, 2002, 16 (1-2) : 111 - 123
  • [43] Adaptive parallel tiled code generation and accelerated auto-tuning
    Tavarageri, Sanket
    Ramanujam, J.
    Sadayappan, P.
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2013, 27 (04): : 412 - 425
  • [44] MaSiF: Machine Learning Guided Auto-tuning of Parallel Skeletons
    Collins, Alexander
    Fensch, Christian
    Leather, Hugh
    PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'12), 2012, : 437 - 438
  • [45] MaSiF: Machine Learning Guided Auto-tuning of Parallel Skeletons
    Collins, Alexander
    Fensch, Christian
    Leather, Hugh
    Cole, Murray
    2013 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2013, : 186 - 195
  • [46] Evolving AVX512 Parallel C Code Using GP
    Langdon, William B.
    Lorenz, Ronny
    GENETIC PROGRAMMING, EUROGP 2019, 2019, 11451 : 245 - 261
  • [47] GLAF: A Visual Programming and Auto-Tuning Framework for Parallel Computing
    Krommydas, Konstantinos
    Sasanka, Ruchira
    Feng, Wu-chun
    2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 859 - 868
  • [48] ABCLib_DRSSED: A parallel eigensolver with an auto-tuning facility
    Katagiri, T
    Kise, K
    Honda, H
    Yuba, T
    PARALLEL COMPUTING, 2006, 32 (03) : 231 - 250
  • [49] Auto-Tuning Quadcopter Using Loop Shaping
    Tnunay, Hilton
    Abdurrohman, M. Qodar
    Nugroho, Yuliyanto
    Inovan, Reka
    Cahyadi, Adha
    Yamamoto, Yoshio
    2013 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2013, : 111 - 115
  • [50] Auto-Tuning of Parallel IO Parameters for HDF5 Applications
    Behzad, Babak
    Huchette, Joey
    Huong Luu
    Aydt, Ruth
    Koziol, Quincey
    Prabhat
    Byna, Suren
    Chaarawi, Mohamad
    Yao, Yushu
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1429 - 1430