TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs

被引:36
|
作者
Niu, Yuyao [1 ]
Lu, Zhengyang [1 ]
Dong, Meichen [1 ]
Jin, Zhou [1 ]
Liu, Weifeng [1 ]
Tan, Guangming [2 ]
机构
[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
sparse matrix-vector multiplication; tiling; GPU; SPMV; OPTIMIZATION; FORMAT;
D O I
10.1109/IPDPS49936.2021.00016
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the extensive use of GPUs in modern supercomputers, accelerating sparse matrix-vector multiplication (SpMV) on GPUs received much attention in the last couple of decades. A number of techniques. such as increasing utilization of wide vector units, reducing load imbalance and selecting the best formats, have been developed. However, the 2D spatial sparsity structure has not been well exploited in the existing work for SpMV on GPUs. In this paper, we propose an efficient tiled algorithm called TileSpMV for optimizing SpMV on GPUs through exploiting 2D spatial structure of sparse matrices. We first implement seven warp-level SpMV methods for calculating sparse tiles stored in a variety of formats, and then design a selection method to find the best format and SpMV implementation for each tile. We also adaptively extract nonzeros in the very sparse tiles into a separate matrix to maximize the overall performance. The experimental results show that our method is faster than state-of-the-art SpMV methods such as Merge-SpMV, CSR5 and BSR in most matrices of the full SuiteSparse Matrix Collection and delivers up to 2.61x, 3.96x and 426.59x speedups, respectively.
引用
收藏
页码:68 / 78
页数:11
相关论文
共 50 条
  • [1] TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs
    Ji, Haonan
    Song, Huimin
    Lu, Shibo
    Jin, Zhou
    Tan, Guangming
    Liu, Weifeng
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [2] Optimization techniques for sparse matrix-vector multiplication on GPUs
    Maggioni, Marco
    Berger-Wolf, Tanya
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
  • [3] Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
    Monakov, Alexander
    Avetisyan, Arutyun
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, PROCEEDINGS, 2009, 5657 : 289 - 297
  • [4] Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs
    Feng, Xiaowen
    Jin, Hai
    Zheng, Ran
    Hu, Kan
    Zeng, Jingxiang
    Shao, Zhiyuan
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 165 - 172
  • [5] Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
    Tanabe, Noboru
    Ogawa, Yuuka
    Takata, Masami
    Joe, Kazuki
    PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2011, : 101 - 108
  • [6] Multiple-precision sparse matrix-vector multiplication on GPUs
    Isupov, Konstantin
    JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 61
  • [7] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Parthasarathy, Srinivasan
    Sadayappan, P.
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
  • [8] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
    Sedaghati, Naser
    Ashari, Arash
    Pouchet, Louis-Noel
    Parthasarathy, Srinivasan
    Sadayappan, P.
    2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24
  • [9] Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA
    Nurudin Alvarez, Francisco
    Antonio Ortega-Toro, Jose
    Ujaldon, Manuel
    HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 219 - 229
  • [10] Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs
    Schmidt, Bertil
    Aribowo, Hans
    Dang, Hoang-Vu
    EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 413 - 424