Optimization techniques for sparse matrix-vector multiplication on GPUs

被引：17

作者：

Maggioni, Marco ^{[1
]}

Berger-Wolf, Tanya ^{[1
]}

机构：

[1] Univ Illinois, Dept Comp Sci, 851 S Morgan,Room 1120 SEO, Chicago, IL 60607 USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2016年 / 93-94卷

基金：

美国国家科学基金会;

关键词：

SpMV; Optimization; GPU; Adaptive; AdELL; Blocking; Compression; Unrolling; Auto-tuning;

D O I：

10.1016/j.jpdc.2016.03.011

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Sparse linear algebra is fundamental to numerous areas of applied mathematics, science and engineering. In this paper, we propose an efficient data structure named AdELL+ for optimizing the SpMV kernel on GPUs, focusing on performance bottlenecks of sparse computation. The foundation of our work is an ELL-based adaptive format which copes with matrix irregularity using balanced warps composed using a parametrized warp-balancing heuristic. We also address the intrinsic bandwidth-limited nature of SpMV with warp granularity, blocking, delta compression and nonzero unrolling, targeting both memory footprint and memory hierarchy efficiency. Finally, we introduce a novel online auto-tuning approach that uses a quality metric to predict efficient block factors and that hides preprocessing overhead with useful SpMV computation. Our experimental results show that AdELL+ achieves comparable or better performance over other state-of-the-art SpMV sparse formats proposed in academia (BCCOO) and industry (CSR+ and CSR-Adaptive). Moreover, our auto-tuning approach makes AdELL+ viable for real world applications. (C) 2016 Elsevier Inc. All rights reserved.

引用

页码：66 / 86

页数：21

共 50 条

[1] Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs
Pichel, Juan C.
Rivera, Francisco F.
Fernandez, Marcos
Rodriguez, Aurelio
MICROPROCESSORS AND MICROSYSTEMS, 2012, 36 (02) : 65 - 77
[2] Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs
Feng, Xiaowen
Jin, Hai
Zheng, Ran
Hu, Kan
Zeng, Jingxiang
Shao, Zhiyuan
2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 165 - 172
[3] A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs
Guo, Ping
Wang, Liqiang
Chen, Po
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (05) : 1112 - 1123
[4] Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
Monakov, Alexander
Avetisyan, Arutyun
EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, PROCEEDINGS, 2009, 5657 : 289 - 297
[5] Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
Tanabe, Noboru
Ogawa, Yuuka
Takata, Masami
Joe, Kazuki
PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2011, : 101 - 108
[6] Multiple-precision sparse matrix-vector multiplication on GPUs
Isupov, Konstantin
JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 61
[7] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
Ashari, Arash
Sedaghati, Naser
Eisenlohr, John
Parthasarathy, Srinivasan
Sadayappan, P.
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
[8] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
Sedaghati, Naser
Ashari, Arash
Pouchet, Louis-Noel
Parthasarathy, Srinivasan
Sadayappan, P.
2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24
[9] Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA
Nurudin Alvarez, Francisco
Antonio Ortega-Toro, Jose
Ujaldon, Manuel
HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 219 - 229
[10] TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs
Niu, Yuyao
Lu, Zhengyang
Dong, Meichen
Jin, Zhou
Liu, Weifeng
Tan, Guangming
2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 68 - 78

← 1 2 3 4 5 →