On the Performance Prediction of BLAS-based Tensor Contractions

被引:11
|
作者
Peise, Elmar [1 ]
Fabregat-Traver, Diego [1 ]
Bientinesi, Paolo [1 ]
机构
[1] Rhein Westfal TH Aachen, AICES, D-52062 Aachen, Germany
关键词
SET;
D O I
10.1007/978-3-319-17248-4_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tensor operations are surging as the computational building blocks for a variety of scientific simulations and the development of high-performance kernels for such operations is known to be a challenging task. While for operations on one-and two-dimensional tensors there exist standardized interfaces and highly-optimized libraries (BLAS), for higher dimensional tensors neither standards nor highly-tuned implementations exist yet. In this paper, we consider contractions between two tensors of arbitrary dimensionality and take on the challenge of generating high-performance implementations by resorting to sequences of BLAS kernels. The approach consists in breaking the contraction down into operations that only involve matrices or vectors. Since in general there are many alternative ways of decomposing a contraction, we are able to methodically derive a large family of algorithms. The main contribution of this paper is a systematic methodology to accurately identify the fastest algorithms in the bunch, without executing them. The goal is instead accomplished with the help of a set of cache-aware micro-benchmarks for the underlying BLAS kernels. The predictions we construct from such benchmarks allow us to reliably single out the best-performing algorithms in a tiny fraction of the time taken by the direct execution of the algorithms.
引用
收藏
页码:193 / 212
页数:20
相关论文
共 50 条
  • [31] Tensor decomposition based web service QoS prediction
    Chai, Sheng
    Feng, Wenying
    Hassanein, Hossam S.
    JOURNAL OF COUPLED SYSTEMS AND MULTISCALE DYNAMICS, 2016, 4 (02) : 113 - 118
  • [32] Explicable Location Prediction Based on Preference Tensor Model
    Zhang, Duoduo
    Yang, Ning
    Ma, Yuchi
    WEB-AGE INFORMATION MANAGEMENT, PT I, 2016, 9658 : 205 - 216
  • [33] Tensor Completion based Prediction in Wireless Edge Caching
    Garg, Navneet
    Ratnarajah, Tharmalingam
    2020 54TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2020, : 1579 - 1582
  • [34] Link prediction in heterogeneous networks based on tensor factorization
    School of Software, Dalian University of Technology, Dalian
    Liaoning
    116620, China
    不详
    116023, China
    Open. Cybern. Syst. J., 1 (316-321):
  • [35] FT-BLAS: A Fault Tolerant High Performance BLAS Implementation on x86 CPUs
    Zhai, Yujia
    Giem, Elisabeth
    Zhao, Kai
    Liu, Jinyang
    Huang, Jiajun
    Wong, Bryan M.
    Shelton, Christian R.
    Chen, Zizhong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (12) : 3207 - 3223
  • [36] GEMM-based level 3 BLAS:: High-performance model implementations and performance evaluation benchmark
    Kågström, B
    Ling, P
    Van Loan, C
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1998, 24 (03): : 268 - 302
  • [37] Hot metal temperature prediction by neural networks in blas furnace
    Luis, CJ
    Garcés, Y
    REVISTA DE METALURGIA, 2002, 38 (04) : 270 - 287
  • [38] The BLAS API of BLASFEO: Optimizing Performance for Small Matrices
    Frison, Gianluca
    Sartor, Tommaso
    Zanelli, Andrea
    Diehl, Moritz
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2020, 46 (02):
  • [39] Analytical Cache Modeling and Tilesize Optimization for Tensor Contractions
    Li, Rui
    Sukumaran-Rajam, Aravind
    Veras, Richard
    Low, Tze Meng
    Rastello, Fabrice
    Rountev, Atanas
    Sadayappan, P.
    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [40] Tensor product of proper contractions, stable and posinormal operators
    Kubrusly, Carlos S.
    PUBLICATIONES MATHEMATICAE-DEBRECEN, 2007, 71 (3-4): : 425 - 437