An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

被引:10
作者
Abu-Sufah, Walid [1 ]
Karim, Asma Abdel [2 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Univ Jordan, Dept Commun Engn, Amman, Jordan
来源
2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS) | 2012年
关键词
GPU; CUDA; sparse linear algebra; SpMV;
D O I
10.1109/HPCC.2012.68
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse matrix vector multiplication, SpMV, is often a performance bottleneck in iterative solvers. Recently, Graphics Processing Units, GPUs, have been deployed to enhance the performance of this operation. We present a blocked version of the Transposed Jagged Diagonal storage format which is tailored for GPUs, BTJAD. We develop a highly optimized SpMV kernel that takes advantage of the properties of the BTJAD storage format and reuses loaded values of the source vector in the registers of a GPU. Using 62 matrices with different sparsity patterns and executing on an NVIDIA Tesla T10 GPU, we compare the performance of our kernel with that of the SpMV kernels in NVIDIA's library. Our kernel achieves superior execution throughputs for matrices that are non-uniform in their nonzero row lengths, outperforming the best available kernels by up to 4.67x. When executing on the Fermi class GeForce GTX480 GPU which has a larger register file size, the maximum speedup achieved by our kernel improves to 6.6x.
引用
收藏
页码:453 / 460
页数:8
相关论文
共 18 条
[1]  
[Anonymous], P 2011 TERAGRID C EX
[2]  
[Anonymous], 2003, ITERATIVE METHODS SP, DOI DOI 10.1137/1.9780898718003
[3]  
Baskaran M.M., 2009, RC24704 IBM
[4]  
Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
[5]  
Bell N., 2011, COMMUNICATION
[6]  
Choi J., 2010, P 15 ACM SIGPLAN S P, P37, DOI DOI 10.1145/1693453/1693471
[7]  
Davis T.A., ACM Transactions on Mathematical Software
[8]  
Hwu W.-m., 2011, Gpu Computing Gems
[9]   Compute Unified Device Architecture Application Suitability [J].
Hwu, Wen-Mei ;
Rodrigues, Christopher ;
Ryoo, Shane ;
Stratton, John .
COMPUTING IN SCIENCE & ENGINEERING, 2009, 11 (03) :16-26
[10]  
Jain A., 2008, THESIS