Characterization of data movement requirements for sparse matrix computations on GPUs

被引：4

作者：

Kurt, Sureyya Emre ^{[1
]}

Thumma, Vineeth ^{[1
]}

Hong, Changwan ^{[1
]}

Sukumaran-Rajam, Aravind ^{[1
]}

Sadayappan, P. ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) | 2017年

基金：

美国国家科学基金会;

关键词：

data-movement bounds; sparse matrix-vector multiplication (SpMV); sparse matrix-matrix multiplication (SpGEMM); graph analytics; hypergraph partitioning; GPU computing;

D O I：

10.1109/HiPC.2017.00040

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Tight data movement lower bounds are known for dense matrix-vector multiplication and dense matrix-matrix multiplication and practical implementations exist on GPUs that achieve performance quite close to the roofline bounds based on operational intensity. For large dense matrices, matrix-vector multiplication is bandwidth-limited and its performance is significantly lower than matrix-matrix multiplication. However, in contrast, the performance of sparse matrix-matrix multiplication (SpGEMM) is generally much lower than that of sparse matrix-vector multiplication (SpMV). In this paper, we use a combination of lower-bounds and upper-bounds analysis of data movement requirements, as well as hardware counter based measurements to gain insights into the performance limitations of existing implementations for SpGEMM on GPUs. The analysis motivates the development of an adaptive work distribution strategy among threads and results in performance enhancement for SpGEMM code on GPUs.

引用

页码：283 / 293

页数：11

共 50 条

[21] Unleashing the performance of bmSparse for the sparse matrix multiplication in GPUs
Berger, Gonzalo
Freire, Manuel
Marini, Renzo
Dufrechou, Ernesto
Ezzatti, Pablo
PROCEEDINGS OF SCALA 2021: 12TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE- SCALE SYSTEMS, 2021, : 19 - 26
[22] Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs
Berger, Gonzalo
Dufrechou, Ernesto
Ezzatti, Pablo
EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT I, EURO-PAR 2023, 2024, 14351 : 246 - 256
[23] Automating Wavefront Parallelization for Sparse Matrix Computations
Venkat, Anand
Mohammadi, Mandi Soltan
Park, Jongsoo
Rong, Hongbo
Barik, Rajkishore
Strout, Michelle Mills
Hall, Mary
SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 480 - 491
[24] Toward an automatic parallelization of sparse matrix computations
Adle, R
Aiguier, M
Delaplace, F
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2005, 65 (03) : 313 - 330
[25] PREDICTING STRUCTURE IN SPARSE-MATRIX COMPUTATIONS
GILBERT, JR
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1994, 15 (01) : 62 - 79
[26] Sparse matrix computations for dynamic network centrality
Arrigo F.
Higham D.J.
Arrigo, Francesca (francesca.arrigo@strath.ac.uk), 2017, Springer Science and Business Media Deutschland GmbH (02)
[27] Sparse matrix computations on manycore GPU's
Garland, Michael
2008 45TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2008, : 2 - 6
[28] Adaptive Optimization for Sparse Data on Heterogeneous GPUs
Ma, Yujing
Rusu, Florin
Wu, Kesheng
Sim, Alexander
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 1088 - 1097
[29] ON FINDING SUPERNODES FOR SPARSE-MATRIX COMPUTATIONS
LIU, JWH
NG, EG
PEYTON, BW
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1993, 14 (01) : 242 - 252
[30] Modelling the cache performance of sparse matrix computations
Rauber, T
Scholtes, C
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2271 - 2277

← 1 2 3 4 5 →