Optimization techniques for sparse matrix-vector multiplication on GPUs

被引:17
|
作者
Maggioni, Marco [1 ]
Berger-Wolf, Tanya [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, 851 S Morgan,Room 1120 SEO, Chicago, IL 60607 USA
基金
美国国家科学基金会;
关键词
SpMV; Optimization; GPU; Adaptive; AdELL; Blocking; Compression; Unrolling; Auto-tuning;
D O I
10.1016/j.jpdc.2016.03.011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse linear algebra is fundamental to numerous areas of applied mathematics, science and engineering. In this paper, we propose an efficient data structure named AdELL+ for optimizing the SpMV kernel on GPUs, focusing on performance bottlenecks of sparse computation. The foundation of our work is an ELL-based adaptive format which copes with matrix irregularity using balanced warps composed using a parametrized warp-balancing heuristic. We also address the intrinsic bandwidth-limited nature of SpMV with warp granularity, blocking, delta compression and nonzero unrolling, targeting both memory footprint and memory hierarchy efficiency. Finally, we introduce a novel online auto-tuning approach that uses a quality metric to predict efficient block factors and that hides preprocessing overhead with useful SpMV computation. Our experimental results show that AdELL+ achieves comparable or better performance over other state-of-the-art SpMV sparse formats proposed in academia (BCCOO) and industry (CSR+ and CSR-Adaptive). Moreover, our auto-tuning approach makes AdELL+ viable for real world applications. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:66 / 86
页数:21
相关论文
共 50 条
  • [31] Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)
    AlAhmadi, Sarah
    Mohammed, Thaha
    Albeshri, Aiiad
    Katib, Iyad
    Mehmood, Rashid
    ELECTRONICS, 2020, 9 (10) : 1 - 30
  • [32] Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression
    Boukaram, Wajih
    Turkiyyah, George
    Keyes, David
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2019, 45 (01):
  • [33] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
    Tao, Yuan
    Deng, Yangdong
    Mu, Shuai
    Zhang, Zhenzhong
    Zhu, Mingfa
    Xiao, Limin
    Ruan, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
  • [34] Vector ISA extension for sparse matrix-vector multiplication
    Vassiliadis, S
    Cotofana, S
    Stathis, P
    EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 708 - 715
  • [35] Understanding the performance of sparse matrix-vector multiplication
    Goumas, Georgios
    Kourtis, Kornilios
    Anastopoulos, Nikos
    Karakasis, Vasileios
    Koziris, Nectarios
    PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 283 - +
  • [36] Sparse matrix-vector multiplication design on FPGAs
    Sun, Junqing
    Peterson, Gregory
    Storaasli, Olaf
    FCCM 2007: 15TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2007, : 349 - +
  • [37] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer
    DuBois, David
    DuBois, Andrew
    Connor, Carolyn
    Poole, Steve
    PROCEEDINGS OF THE SIXTEENTH IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2008, : 239 - +
  • [38] Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor
    Wang, Li
    Yang, Xue Jun
    Bin Wang, Gui
    Yan, Xiao Bo
    Deng, Yu
    Du, Jing
    Zhang, Ying
    Tang, Tao
    Zeng, Kun
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2007, 4742 : 44 - 55
  • [39] Node aware sparse matrix-vector multiplication
    Bienz, Amanda
    Gropp, William D.
    Olson, Luke N.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 130 : 166 - 178
  • [40] STRUCTURED SPARSE MATRIX-VECTOR MULTIPLICATION ON A MASPAR
    DEHN, T
    EIERMANN, M
    GIEBERMANN, K
    SPERLING, V
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1994, 74 (06): : T534 - T538