Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications

被引:116
作者
Ashari, Arash [1 ]
Sedaghati, Naser [1 ]
Eisenlohr, John [1 ]
Parthasarathy, Srinivasan [1 ]
Sadayappan, P. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2014年
基金
美国国家科学基金会;
关键词
SpMV; GPU; CSR; HYB; ACSR;
D O I
10.1109/SC.2014.69
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse matrix-vector multiplication (SpMV) is a widely used computational kernel. The most commonly used format for a sparse matrix is CSR (Compressed Sparse Row), but a number of other representations have recently been developed that achieve higher SpMV performance. However, the alternative representations typically impose a significant preprocessing overhead. While a high preprocessing overhead can be amortized for applications requiring many iterative invocations of SpMV that use the same matrix, it is not always feasible - for instance when analyzing large dynamically evolving graphs. This paper presents ACSR, an adaptive SpMV algorithm that uses the standard CSR format but reduces thread divergence by combining rows into groups (bins) which have a similar number of non-zero elements. Further, for rows in bins that span a wide range of non zero counts, dynamic parallelism is leveraged. A significant benefit of ACSR over other proposed SpMV approaches is that it works directly with the standard CSR format, and thus avoids significant preprocessing overheads. A CUDA implementation of ACSR is shown to outperform SpMV implementations in the NVIDIA CUSP and cuSPARSE libraries on a set of sparse matrices representing power-law graphs. We also demonstrate the use of ACSR for the analysis of dynamic graphs, where the improvement over extant approaches is even higher.
引用
收藏
页码:781 / 792
页数:12
相关论文
共 25 条
[1]  
[Anonymous], 2008, NVIDIA Technical Report NVR-2008-004
[2]  
[Anonymous], 1999, TECH REPORT STANFORD
[3]  
[Anonymous], 2010, REVISION
[4]  
[Anonymous], C HIGH PERF COMP NET
[5]  
Ashari A., 2014, INT C SUP ICS
[6]  
BOLDI P., 2010, CoRR
[7]  
Choi J. W., 2010, ACM SIGPLAN S PRINC
[8]  
CUDA, PAR COMP PLATF PROGR
[9]  
CUSP, NVID LIB GEN PAR ALG
[10]  
Dongarra J., 2013, SAND20134744 NAT LAB