Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs

被引：1

作者：

Ahmad, Khalid ^{[1
,3
]}

Cecka, Cris ^{[2
,4
]}

Garland, Michael ^{[2
,4
]}

Hall, Mary ^{[1
,3
]}

机构：

[1] Univ Utah, Salt Lake City, UT USA

[2] NVIDIA Corp, Santa Clara, CA USA

[3] Univ Utah, Salt Lake City, UT 84108 USA

[4] NVIDIA Corp, Santa Clara, CA 95051 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2024年 / 21卷 / 01期

关键词：

Sparse tensors; SpMM; data layout;

D O I：

10.1145/3633462

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTMis a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we employ a hierarchical tensor data layout that can unfold a multidimensional tensor to derive a 2D matrix, making it possible to compute SpTM using SpMM kernel implementations for GPUs. We compare two SpMM implementations to the state-of-the-art PASTA sparse tensor contraction implementation using: (1) SpMM with hierarchical tensor data layout; and, (2) unfolding followed by an invocation of cuSPARSE's SpMM. Results show that SpMM can outperform PASTA 70.9% of the time, but none of the three approaches is best overall. Therefore, we use a decision tree classifier to identify the best performing sparse tensor contraction kernel based on precomputed properties of the sparse tensor.

引用

页数：20

共 50 条

[1] Optimizing sparse tensor times matrix on GPUs
Ma, Yuchen
Li, Jiajia
Wu, Xiaolong
Yan, Chenggang
Sun, Jimeng
Vuduc, Richard
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 129 : 99 - 109
[2] Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA
Nurudin Alvarez, Francisco
Antonio Ortega-Toro, Jose
Ujaldon, Manuel
HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 219 - 229
[3] Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs
Aliaga, Jose Ignacio
Anzt, Hartwig
Quintana-Orti, Enrique S.
Tomas, Andres E.
Tsai, Yuhsiang M.
EURO-PAR 2020: PARALLEL PROCESSING WORKSHOPS, 2021, 12480 : 83 - 95
[4] Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format
Shi, Shaohuai
Wang, Qiang
Chu, Xiaowen
2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2020, : 19 - 26
[5] TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs
Wang, Yuke
Feng, Boyuan
Wang, Zheng
Huang, Guyue
Ding, Yufei
PROCEEDINGS OF THE 2023 USENIX ANNUAL TECHNICAL CONFERENCE, 2023, : 149 - 164
[6] Characterization of data movement requirements for sparse matrix computations on GPUs
Kurt, Sureyya Emre
Thumma, Vineeth
Hong, Changwan
Sukumaran-Rajam, Aravind
Sadayappan, P.
2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 283 - 293
[7] A Sparse Tensor Benchmark Suite for CPUs and GPUs
Li, Jiajia
Lakshminarasimhan, Mahesh
Wu, Xiaolong
Li, Ang
Olschanowsky, Catherine
Barker, Kevin
2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 193 - 204
[8] Automatic Data Layout Optimizations for GPUs
Kofler, Klaus
Cosenza, Biagio
Fahringer, Thomas
EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 263 - 274
[9] A Unified Optimization Approach for Sparse Tensor Operations on GPUs
Liu, Bangtian
Wen, Chengyao
Sarwate, Anand D.
Dehnavi, Maryam Mehri
2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 47 - 57
[10] A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs
Li, Jiajia
Lakshminarasimhan, Mahesh
Wu, Xiaolong
Li, Ang
Olschanowsky, Catherine
Barker, Kevin
PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 403 - 404

← 1 2 3 4 5 →