Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)

被引:16
作者
AlAhmadi, Sarah [1 ]
Mohammed, Thaha [2 ]
Albeshri, Aiiad [3 ]
Katib, Iyad [3 ]
Mehmood, Rashid [4 ]
机构
[1] Taibah Univ, Dept Comp & Informat Sci, Medina 42353, Saudi Arabia
[2] Aalto Univ, Dept Comp Sci, Espoo 02150, Finland
[3] King Abdulaziz Univ, Dept Comp Sci, Jeddah 21589, Saudi Arabia
[4] King Abdulaziz Univ, High Performance Comp Ctr, Jeddah 21589, Saudi Arabia
关键词
sparse matrix-vector multiplication (SpMV); high performance computing (HPC); sparse matrix storage; graphics processing units (GPUs); CSR; ELL; HYB; CSR5; parallelization; heterogeneous computing; BIG DATA; OPTIMIZATION; COMPUTATIONS; SYSTEMS; FORMAT; MODEL; TOOL;
D O I
10.3390/electronics9101675
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics processing units (GPUs) have delivered a remarkable performance for a variety of high performance computing (HPC) applications through massive parallelism. One such application is sparse matrix-vector (SpMV) computations, which is central to many scientific, engineering, and other applications including machine learning. No single SpMV storage or computation scheme provides consistent and sufficiently high performance for all matrices due to their varying sparsity patterns. An extensive literature review reveals that the performance of SpMV techniques on GPUs has not been studied in sufficient detail. In this paper, we provide a detailed performance analysis of SpMV performance on GPUs using four notable sparse matrix storage schemes (compressed sparse row (CSR), ELLAPCK (ELL), hybrid ELL/COO (HYB), and compressed sparse row 5 (CSR5)), five performance metrics (execution time, giga floating point operations per second (GFLOPS), achieved occupancy, instructions per warp, and warp execution efficiency), five matrix sparsity features (nnz, anpr, nprvariance, maxnpr, and distavg), and 17 sparse matrices from 10 application domains (chemical simulations, computational fluid dynamics (CFD), electromagnetics, linear programming, economics, etc.). Subsequently, based on the deeper insights gained through the detailed performance analysis, we propose a technique called the heterogeneous CPU-GPU Hybrid (HCGHYB) scheme. It utilizes both the CPU and GPU in parallel and provides better performance over the HYB format by an average speedup of 1.7x. Heterogeneous computing is an important direction for SpMV and other application areas. Moreover, to the best of our knowledge, this is the first work where the SpMV performance on GPUs has been discussed in such depth. We believe that this work on SpMV performance analysis and the heterogeneous scheme will open up many new directions and improvements for the SpMV computing field in the future.
引用
收藏
页码:1 / 30
页数:30
相关论文
共 92 条
[51]   Evaluation Criteria for Sparse Matrix Storage Formats [J].
Langr, Daniel ;
Tvrdik, Pavel .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (02) :428-440
[52]  
Langr D, 2013, FED CONF COMPUT SCI, P479
[53]   A survey of eigenvector methods for Web information retrieval [J].
Langville, AN ;
Meyer, CD .
SIAM REVIEW, 2005, 47 (01) :135-161
[54]   SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication [J].
Li, Jiajia ;
Tan, Guangming ;
Chen, Mingyu ;
Sun, Ninghui .
ACM SIGPLAN NOTICES, 2013, 48 (06) :117-126
[55]   Performance Analysis and Optimization for SpMV on GPU Using Probabilistic Modeling [J].
Li, Kenli ;
Yang, Wangdong ;
Li, Keqin .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (01) :196-205
[56]   CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication [J].
Liu, Weifeng ;
Vinter, Brian .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, :339-350
[57]  
LIU X, 2013, Proceedings of the 27th international ACM conference on International conference on supercomputing, P273, DOI DOI 10.1145/2464996.2465013
[58]   Alinea: An Advanced Linear Algebra Library for Massively Parallel Computations on Graphics Processing Units [J].
Magoules, Frederic ;
Ahamed, Abal-Kassim Cheik .
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2015, 29 (03) :284-310
[59]  
Martone M., 2010, SPARSE MATRIX VECTOR, P300
[60]   Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format [J].
Martone, Michele .
PARALLEL COMPUTING, 2014, 40 (07) :251-270