TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs

被引：36

作者：

Niu, Yuyao ^{[1
]}

Lu, Zhengyang ^{[1
]}

Dong, Meichen ^{[1
]}

Jin, Zhou ^{[1
]}

Liu, Weifeng ^{[1
]}

Tan, Guangming ^{[2
]}

机构：

[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China

来源：

2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2021年

基金：

中国国家自然科学基金;

关键词：

sparse matrix-vector multiplication; tiling; GPU; SPMV; OPTIMIZATION; FORMAT;

D O I：

10.1109/IPDPS49936.2021.00016

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the extensive use of GPUs in modern supercomputers, accelerating sparse matrix-vector multiplication (SpMV) on GPUs received much attention in the last couple of decades. A number of techniques. such as increasing utilization of wide vector units, reducing load imbalance and selecting the best formats, have been developed. However, the 2D spatial sparsity structure has not been well exploited in the existing work for SpMV on GPUs. In this paper, we propose an efficient tiled algorithm called TileSpMV for optimizing SpMV on GPUs through exploiting 2D spatial structure of sparse matrices. We first implement seven warp-level SpMV methods for calculating sparse tiles stored in a variety of formats, and then design a selection method to find the best format and SpMV implementation for each tile. We also adaptively extract nonzeros in the very sparse tiles into a separate matrix to maximize the overall performance. The experimental results show that our method is faster than state-of-the-art SpMV methods such as Merge-SpMV, CSR5 and BSR in most matrices of the full SuiteSparse Matrix Collection and delivers up to 2.61x, 3.96x and 426.59x speedups, respectively.

引用

页码：68 / 78

页数：11

共 50 条

[1] TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs
Ji, Haonan
Song, Huimin
Lu, Shibo
Jin, Zhou
Tan, Guangming
Liu, Weifeng
51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
[2] Optimization techniques for sparse matrix-vector multiplication on GPUs
Maggioni, Marco
Berger-Wolf, Tanya
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
[3] Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
Monakov, Alexander
Avetisyan, Arutyun
EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, PROCEEDINGS, 2009, 5657 : 289 - 297
[4] Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs
Feng, Xiaowen
Jin, Hai
Zheng, Ran
Hu, Kan
Zeng, Jingxiang
Shao, Zhiyuan
2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 165 - 172
[5] Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
Tanabe, Noboru
Ogawa, Yuuka
Takata, Masami
Joe, Kazuki
PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2011, : 101 - 108
[6] Multiple-precision sparse matrix-vector multiplication on GPUs
Isupov, Konstantin
JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 61
[7] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
Ashari, Arash
Sedaghati, Naser
Eisenlohr, John
Parthasarathy, Srinivasan
Sadayappan, P.
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
[8] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
Sedaghati, Naser
Ashari, Arash
Pouchet, Louis-Noel
Parthasarathy, Srinivasan
Sadayappan, P.
2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24
[9] Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA
Nurudin Alvarez, Francisco
Antonio Ortega-Toro, Jose
Ujaldon, Manuel
HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 219 - 229
[10] Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs
Schmidt, Bertil
Aribowo, Hans
Dang, Hoang-Vu
EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 413 - 424

← 1 2 3 4 5 →