Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs

被引：1

作者：

Ahmad, Khalid ^{[1
,3
]}

Cecka, Cris ^{[2
,4
]}

Garland, Michael ^{[2
,4
]}

Hall, Mary ^{[1
,3
]}

机构：

[1] Univ Utah, Salt Lake City, UT USA

[2] NVIDIA Corp, Santa Clara, CA USA

[3] Univ Utah, Salt Lake City, UT 84108 USA

[4] NVIDIA Corp, Santa Clara, CA 95051 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2024年 / 21卷 / 01期

关键词：

Sparse tensors; SpMM; data layout;

D O I：

10.1145/3633462

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTMis a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we employ a hierarchical tensor data layout that can unfold a multidimensional tensor to derive a 2D matrix, making it possible to compute SpTM using SpMM kernel implementations for GPUs. We compare two SpMM implementations to the state-of-the-art PASTA sparse tensor contraction implementation using: (1) SpMM with hierarchical tensor data layout; and, (2) unfolding followed by an invocation of cuSPARSE's SpMM. Results show that SpMM can outperform PASTA 70.9% of the time, but none of the three approaches is best overall. Therefore, we use a decision tree classifier to identify the best performing sparse tensor contraction kernel based on precomputed properties of the sparse tensor.

引用

页数：20

共 50 条

[31] An Input-Adaptive and In-Place Approach to Dense Tensor-Times-Matrix Multiply
Li, Jiajia
Battaglino, Casey
Perros, Ioakeim
Sun, Jimeng
Vuduc, Richard
PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
[32] "Wide or Tall" and "Sparse Matrix Dense Matrix" Multiplications
Howell, Gary W.
HIGH PERFORMANCE COMPUTING SYMPOSIUM 2011 (HPC 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 6 OF 8, 2011, 43 (02): : 159 - 165
[33] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
Liu, Weifeng
Vinter, Brian
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
[34] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
Wei, Bingxin
Wang, Yizhuo
Chang, Fangli
Gao, Jianhua
Ji, Weixing
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
[35] Advancing on an efficient sparse matrix multiplication kernel for modern GPUs
Berger, Gonzalo
Freire, Manuel
Marini, Renzo
Dufrechou, Ernesto
Ezzatti, Pablo
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (20):
[36] RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
Brock, Benjamin
Buluc, Aydin
Yelick, Katherine
PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 225 - 235
[37] Optimizing Sparse Matrix Operations on GPUs using Merge Path
Dalton, Steven
Olson, Luke
Baxter, Sean
Merrill, Duane
Garland, Michael
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 407 - 416
[38] Optimization techniques for sparse matrix-vector multiplication on GPUs
Maggioni, Marco
Berger-Wolf, Tanya
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
[39] A new approach for sparse matrix vector product on NVIDIA GPUs
Vazquez, F.
Fernandez, J. J.
Garzon, E. M.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (08): : 815 - 826
[40] On Implementing Sparse Matrix Multi-Vector Multiplication on GPUs
Abu-Sufah, Walid
Ahmad, Khalid
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1117 - 1124

← 1 2 3 4 5 →