SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication

被引:102
作者
Li, Jiajia [1 ]
Tan, Guangming [1 ]
Chen, Mingyu [1 ]
Sun, Ninghui [1 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Sta Key Lab Comp Architecture, Beijing 100864, Peoples R China
关键词
Algorithms; Performance; sparse matrix-vector multiplication; SpMV; auto-tuning; data mining; algebraic multi-grid; LIBRARY; MODEL;
D O I
10.1145/2499370.2462181
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific approaches, making the libraries become too complicated to be used extensively in real applications. In this work we develop a Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage. S-MAT provides users with a unified programming interface in compressed sparse row (CSR) format and automatically determines the optimal format and implementation for any input sparse matrix at runtime. For this purpose, SMAT leverages a learning model,which is generated in an off-line stage by a machine learning method with at raining set of more than 2000 matrices from the UF sparse matrix collection, to quickly predict the best combination of the matrix feature parameters. Our experiments show that SMAT achieves impressive performance of up to 51GFLOPS in single-precision and 37GFLOPS in double-precision on mainstream x86 multi-core processors, which are both more than 3 times faster than the Intel MKL library. We also demonstrate its adaptability in an algebraic multi-grid solver from Hypre library with above 20% performance improvement reported.
引用
收藏
页码:117 / 126
页数:10
相关论文
共 34 条
[1]  
Adam Hill M. D., 2005, TECHNICAL REPORT
[2]  
[Anonymous], 1998, SC 98, DOI [10.5555/509058.509096, DOI 10.1109/SC.1998.10004]
[3]  
[Anonymous], 2011, P 4 WORKSH GEN PURP
[4]  
[Anonymous], 1994, Technical Report
[5]  
[Anonymous], 2008, NVIDIA Technical Report NVR-2008-004
[6]  
[Anonymous], 2012, ACM SIGKDD Explorations Newsletter, DOI [DOI 10.1145/2207243.2207252, 10.1145/2207243.2207252]
[7]  
[Anonymous], 2009, P C HIGH PERFORMANCE, DOI DOI 10.1145/1654059.1654119
[8]  
[Anonymous], 2007, P 2007 ACM IEEE C SU
[9]   PetaBricks: A Language and Compiler for Algorithmic Choice [J].
Ansel, Jason ;
Chan, Cy ;
Wong, Yee Lok ;
Olszewski, Marek ;
Zhao, Qin ;
Edelman, Alan ;
Amarasinghe, Saman .
ACM SIGPLAN NOTICES, 2009, 44 (06) :38-49
[10]   Reinforcement Learning for Automated Performance Tuning: Initial Evaluation for Sparse Matrix Format Selection [J].
Armstrong, Warren ;
Rendell, Alistair P. .
2008 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2008, :411-420