Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs

被引：1

作者：

Ahmad, Khalid ^{[1
,3
]}

Cecka, Cris ^{[2
,4
]}

Garland, Michael ^{[2
,4
]}

Hall, Mary ^{[1
,3
]}

机构：

[1] Univ Utah, Salt Lake City, UT USA

[2] NVIDIA Corp, Santa Clara, CA USA

[3] Univ Utah, Salt Lake City, UT 84108 USA

[4] NVIDIA Corp, Santa Clara, CA 95051 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2024年 / 21卷 / 01期

关键词：

Sparse tensors; SpMM; data layout;

D O I：

10.1145/3633462

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTMis a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we employ a hierarchical tensor data layout that can unfold a multidimensional tensor to derive a 2D matrix, making it possible to compute SpTM using SpMM kernel implementations for GPUs. We compare two SpMM implementations to the state-of-the-art PASTA sparse tensor contraction implementation using: (1) SpMM with hierarchical tensor data layout; and, (2) unfolding followed by an invocation of cuSPARSE's SpMM. Results show that SpMM can outperform PASTA 70.9% of the time, but none of the three approaches is best overall. Therefore, we use a decision tree classifier to identify the best performing sparse tensor contraction kernel based on precomputed properties of the sparse tensor.

引用

页数：20

共 50 条

[41] Distributed-Memory Parallel Algorithms for Sparse Times Tall-Skinny-Dense Matrix Multiplication
Selvitopi, Oguz
Brock, Benjamin
Nisa, Israt
Tripathy, Alok
Yelick, Katherine
Buluc, Aydin
PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2021, 2021, : 431 - 442
[42] A DENSE GATE MATRIX LAYOUT METHOD FOR MOS VLSI
LOPEZ, AD
LAW, HFS
IEEE TRANSACTIONS ON ELECTRON DEVICES, 1980, 27 (08) : 1671 - 1675
[43] A DENSE GATE MATRIX LAYOUT METHOD FOR MOS VLSI
LOPEZ, AD
LAW, HFS
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1980, 15 (04) : 736 - 740
[44] The I/O Complexity of Sparse Matrix Dense Matrix Multiplication
Greiner, Gero
Jacob, Riko
LATIN 2010: THEORETICAL INFORMATICS, 2010, 6034 : 143 - 156
[45] Register-based Implementation of the Sparse General Matrix-Matrix Multiplication on GPUs
Liu, Junhong
He, Xin
Liu, Weifeng
Tan, Guangming
ACM SIGPLAN NOTICES, 2018, 53 (01) : 407 - 408
[46] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
Niu, Yuyao
Lu, Zhengyang
Ji, Haonan
Song, Shuhui
Jin, Zhou
Liu, Weifeng
PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
[47] Matrix and tensor completion using tensor ring decomposition with sparse representation
Asante-Mensah, Maame G.
Ahmadi-Asl, Salman
Cichocki, Andrzej
MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2021, 2 (03):
[48] Impact of Tensor Cores and Mixed Precision on the Reliability of Matrix Multiplication in GPUs
Basso, Pedro Martins
dos Santos, Fernando Fernandes
Rech, Paolo
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2020, 67 (07) : 1560 - 1565
[49] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
Ashari, Arash
Sedaghati, Naser
Eisenlohr, John
Parthasarathy, Srinivasan
Sadayappan, P.
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
[50] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
Sedaghati, Naser
Ashari, Arash
Pouchet, Louis-Noel
Parthasarathy, Srinivasan
Sadayappan, P.
2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24

← 1 2 3 4 5 →