Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA

被引:9
作者
Xu, Shiming [1 ]
Lin, Hai Xiang [1 ]
Xue, Wei [2 ]
机构
[1] Delft Univ Technol, Delft Inst Appl Math, Delft, Netherlands
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010) | 2010年
关键词
SpMV; GP-GPU; NVIDIA CUDA; RCM;
D O I
10.1109/DCABES.2010.162
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we propose the optimization of sparse matrix-vector multiplication (SpMV) with CUDA based on matrix bandwidth/profile reduction techniques. Computational time required to access dense vector is decoupled from SpMV computation. By reducing the matrix profile, the time required to access dense vector is reduced by 17% (for SP) and 24% (for DP). Reduced matrix bandwidth enables column index information compression with shorter formats, resulting in a 17% (for SP) and 10% (for DP) execution time reduction for accessing matrix data under ELLPACK format. The overall speedup for SpMV is 16% and 12.6% for the whole matrix test suite. The optimization proposed in this paper can be combined with other SpMV optimizations such as register blocking.
引用
收藏
页码:609 / 614
页数:6
相关论文
共 14 条
[1]  
[Anonymous], 2003, CITESEERX
[2]  
[Anonymous], 2003, ITERATIVE METHODS SP, DOI DOI 10.1137/1.9780898718003
[3]  
Baskara Muthu Manikandan, 2008, RC24704 IBM TJ WATS
[4]  
Bell Nathan, 2009, P SC O9
[5]  
Choi Jee W., 2010, P PPOPP 10
[6]  
Crane H. L. Jr., 1976, ACM Transactions on Mathematical Software, V2, P375, DOI 10.1145/355705.355712
[7]  
George J.A., 1981, Computer Solution of Large Sparse Positive Definite Systems
[8]  
Grimes Roger G., 1979, CNA150 CTR NUM AN U
[9]   Minimizing the profile of a symmetric matrix [J].
Hager, WW .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2002, 23 (05) :1799-1816
[10]   Performance optimization of irregular codes based on the combination of reordering and blocking, techniques [J].
Pichel, JC ;
Heras, DB ;
Cabaleiro, JC ;
Rivera, FF .
PARALLEL COMPUTING, 2005, 31 (8-9) :858-876