Optimization techniques for sparse matrix-vector multiplication on GPUs

被引：17

作者：

Maggioni, Marco ^{[1
]}

Berger-Wolf, Tanya ^{[1
]}

机构：

[1] Univ Illinois, Dept Comp Sci, 851 S Morgan,Room 1120 SEO, Chicago, IL 60607 USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2016年 / 93-94卷

基金：

美国国家科学基金会;

关键词：

SpMV; Optimization; GPU; Adaptive; AdELL; Blocking; Compression; Unrolling; Auto-tuning;

D O I：

10.1016/j.jpdc.2016.03.011

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Sparse linear algebra is fundamental to numerous areas of applied mathematics, science and engineering. In this paper, we propose an efficient data structure named AdELL+ for optimizing the SpMV kernel on GPUs, focusing on performance bottlenecks of sparse computation. The foundation of our work is an ELL-based adaptive format which copes with matrix irregularity using balanced warps composed using a parametrized warp-balancing heuristic. We also address the intrinsic bandwidth-limited nature of SpMV with warp granularity, blocking, delta compression and nonzero unrolling, targeting both memory footprint and memory hierarchy efficiency. Finally, we introduce a novel online auto-tuning approach that uses a quality metric to predict efficient block factors and that hides preprocessing overhead with useful SpMV computation. Our experimental results show that AdELL+ achieves comparable or better performance over other state-of-the-art SpMV sparse formats proposed in academia (BCCOO) and industry (CSR+ and CSR-Adaptive). Moreover, our auto-tuning approach makes AdELL+ viable for real world applications. (C) 2016 Elsevier Inc. All rights reserved.

引用

页码：66 / 86

页数：21

共 50 条

[31] Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)
AlAhmadi, Sarah
Mohammed, Thaha
Albeshri, Aiiad
Katib, Iyad
Mehmood, Rashid
ELECTRONICS, 2020, 9 (10) : 1 - 30
[32] Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression
Boukaram, Wajih
Turkiyyah, George
Keyes, David
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2019, 45 (01):
[33] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
Tao, Yuan
Deng, Yangdong
Mu, Shuai
Zhang, Zhenzhong
Zhu, Mingfa
Xiao, Limin
Ruan, Li
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
[34] Vector ISA extension for sparse matrix-vector multiplication
Vassiliadis, S
Cotofana, S
Stathis, P
EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 708 - 715
[35] Understanding the performance of sparse matrix-vector multiplication
Goumas, Georgios
Kourtis, Kornilios
Anastopoulos, Nikos
Karakasis, Vasileios
Koziris, Nectarios
PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 283 - +
[36] Sparse matrix-vector multiplication design on FPGAs
Sun, Junqing
Peterson, Gregory
Storaasli, Olaf
FCCM 2007: 15TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2007, : 349 - +
[37] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer
DuBois, David
DuBois, Andrew
Connor, Carolyn
Poole, Steve
PROCEEDINGS OF THE SIXTEENTH IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2008, : 239 - +
[38] Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor
Wang, Li
Yang, Xue Jun
Bin Wang, Gui
Yan, Xiao Bo
Deng, Yu
Du, Jing
Zhang, Ying
Tang, Tao
Zeng, Kun
PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2007, 4742 : 44 - 55
[39] Node aware sparse matrix-vector multiplication
Bienz, Amanda
Gropp, William D.
Olson, Luke N.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 130 : 166 - 178
[40] STRUCTURED SPARSE MATRIX-VECTOR MULTIPLICATION ON A MASPAR
DEHN, T
EIERMANN, M
GIEBERMANN, K
SPERLING, V
ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1994, 74 (06): : T534 - T538

← 1 2 3 4 5 →