Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs

被引：1

作者：

Ahmad, Khalid ^{[1
,3
]}

Cecka, Cris ^{[2
,4
]}

Garland, Michael ^{[2
,4
]}

Hall, Mary ^{[1
,3
]}

机构：

[1] Univ Utah, Salt Lake City, UT USA

[2] NVIDIA Corp, Santa Clara, CA USA

[3] Univ Utah, Salt Lake City, UT 84108 USA

[4] NVIDIA Corp, Santa Clara, CA 95051 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2024年 / 21卷 / 01期

关键词：

Sparse tensors; SpMM; data layout;

D O I：

10.1145/3633462

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTMis a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we employ a hierarchical tensor data layout that can unfold a multidimensional tensor to derive a 2D matrix, making it possible to compute SpTM using SpMM kernel implementations for GPUs. We compare two SpMM implementations to the state-of-the-art PASTA sparse tensor contraction implementation using: (1) SpMM with hierarchical tensor data layout; and, (2) unfolding followed by an invocation of cuSPARSE's SpMM. Results show that SpMM can outperform PASTA 70.9% of the time, but none of the three approaches is best overall. Therefore, we use a decision tree classifier to identify the best performing sparse tensor contraction kernel based on precomputed properties of the sparse tensor.

引用

页数：20

共 50 条

[21] Batched Small Tensor-Matrix Multiplications on GPUs
Zhai, Keke
Banerjee, Tania
Wijayasiri, Adeesha
Ranka, Sanjay
2020 IEEE 27TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2020), 2020, : 305 - 314
[22] Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUs
Wang, Yizhuo
Chang, Fangli
Wei, Bingxin
Gao, Jianhua
Ji, Weixing
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (03)
[23] Unleashing the performance of bmSparse for the sparse matrix multiplication in GPUs
Berger, Gonzalo
Freire, Manuel
Marini, Renzo
Dufrechou, Ernesto
Ezzatti, Pablo
PROCEEDINGS OF SCALA 2021: 12TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE- SCALE SYSTEMS, 2021, : 19 - 26
[24] Exploiting the capabilities of modern GPUs for dense matrix computations
Barrachina, Sergio
Castillo, Maribel
Igual, Francisco D.
Mayo, Rafael
Quintana-Orti, Enrique S.
Quintana-Orti, Gregorio
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (18): : 2457 - 2477
[25] Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs
Berger, Gonzalo
Dufrechou, Ernesto
Ezzatti, Pablo
EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT I, EURO-PAR 2023, 2024, 14351 : 246 - 256
[26] Static Cost Estimation for Data Layout Selection on GPUs
Peng, Yuhan
Grossman, Max
Sarkar, Vivek
PROCEEDINGS OF PMBS 2016: 7TH INTERNATIONAL WORKSHOP ON PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTING SYSTEMS, 2016, : 76 - 86
[27] Adaptive Optimization for Sparse Data on Heterogeneous GPUs
Ma, Yujing
Rusu, Florin
Wu, Kesheng
Sim, Alexander
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 1088 - 1097
[28] TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs
Ji, Haonan
Song, Huimin
Lu, Shibo
Jin, Zhou
Tan, Guangming
Liu, Weifeng
51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
[29] Optimizing Sparse Tensor Times Matrix on Multi-core and Many-core Architectures
Li, Jiajia
Ma, Yuchen
Yan, Chenggang
Vuduc, Richard
PROCEEDINGS OF 2016 6TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURE AND ALGORITHMS (IA3), 2016, : 26 - 33
[30] Regularizing Irregularity: Bitmap-based and Portable Sparse Matrix Multiplication for Graph Data on GPUs
Zhang, Jianting
Gruenwald, Le
GRADES-NDA '18: PROCEEDINGS OF THE 1ST ACM SIGMOD JOINT INTERNATIONAL WORKSHOP ON GRAPH DATA MANAGEMENT EXPERIENCES & SYSTEMS (GRADES) AND NETWORK DATA ANALYTICS (NDA) 2018 (GRADES-NDA 2018), 2018,

← 1 2 3 4 5 →