On the Performance Prediction of BLAS-based Tensor Contractions

被引：11

作者：

Peise, Elmar ^{[1
]}

Fabregat-Traver, Diego ^{[1
]}

Bientinesi, Paolo ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, AICES, D-52062 Aachen, Germany

来源：

HIGH PERFORMANCE COMPUTING SYSTEMS: PERFORMANCE MODELING, BENCHMARKING, AND SIMULATION | 2015年 / 8966卷

关键词：

SET;

D O I：

10.1007/978-3-319-17248-4_10

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Tensor operations are surging as the computational building blocks for a variety of scientific simulations and the development of high-performance kernels for such operations is known to be a challenging task. While for operations on one-and two-dimensional tensors there exist standardized interfaces and highly-optimized libraries (BLAS), for higher dimensional tensors neither standards nor highly-tuned implementations exist yet. In this paper, we consider contractions between two tensors of arbitrary dimensionality and take on the challenge of generating high-performance implementations by resorting to sequences of BLAS kernels. The approach consists in breaking the contraction down into operations that only involve matrices or vectors. Since in general there are many alternative ways of decomposing a contraction, we are able to methodically derive a large family of algorithms. The main contribution of this paper is a systematic methodology to accurately identify the fastest algorithms in the bunch, without executing them. The goal is instead accomplished with the help of a set of cache-aware micro-benchmarks for the underlying BLAS kernels. The predictions we construct from such benchmarks allow us to reliably single out the best-performing algorithms in a tiny fraction of the time taken by the direct execution of the algorithms.

引用

页码：193 / 212

页数：20

共 50 条

[1] Tensor Contractions with Extended BLAS Kernels on CPU and GPU
Shi, Yang
Niranjan, U. N.
Anandkumar, Animashree
Cecka, Cris
PROCEEDINGS OF 2016 IEEE 23RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2016, : 193 - 202
[2] Towards an efficient use of the BLAS library for multilinear tensor contractions
Di Napoli, Edoardo
Fabregat-Traver, Diego
Quintana-Orti, Gregorio
Bientinesi, Paolo
APPLIED MATHEMATICS AND COMPUTATION, 2014, 235 : 454 - 468
[3] A BLAS-Based Algorithm for Finding Position Weight Matrix Occurrences in DNA Sequences on CPUs and GPUs
Fostier, Jan
BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2018, PT I, 2018, 10813 : 439 - 449
[4] BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs
Jan Fostier
BMC Bioinformatics, 21
[5] BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs
Fostier, Jan
BMC BIOINFORMATICS, 2020, 21 (Suppl 2)
[6] Accelerating Robust-Object-Tracking via Level-3 BLAS-Based Sparse Learning
He, Zhaoshui
Liang, Hao
Yang, Senquan
Su, Wenqing
Wang, Peitao
Lin, Zhijie
Tan, Beihai
Xie, Shengli
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5908 - 5920
[7] Design of a high-performance tensor-matrix multiplication with BLAS
Bassoy, Cem Savas
JOURNAL OF COMPUTATIONAL SCIENCE, 2025, 87
[8] Design of a High-Performance Tensor-Vector Multiplication with BLAS
Bassoy, Cem
COMPUTATIONAL SCIENCE - ICCS 2019, PT I, 2019, 11536 : 32 - 45
[9] High-Performance Tensor Contractions for GPUs
Abdelfattah, A.
Baboulin, M.
Dobrev, V.
Dongarra, J.
Earl, C.
Falcou, J.
Haidar, A.
Karlin, I.
Kolev, Tz.
Masliah, I.
Tomov, S.
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 108 - 118
[10] A Code Generator for High-Performance Tensor Contractions on GPUs
Kim, Jinsung
Sukumaran-Rajam, Aravind
Thumma, Vineeth
Krishnamoorthy, Sriram
Panyala, Ajay
Pouchet, Louis-Noel
Rountev, Atanas
Sadayappan, P.
PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO '19), 2019, : 85 - 95

← 1 2 3 4 5 →