Optimization techniques for sparse matrix-vector multiplication on GPUs

被引:17
|
作者
Maggioni, Marco [1 ]
Berger-Wolf, Tanya [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, 851 S Morgan,Room 1120 SEO, Chicago, IL 60607 USA
基金
美国国家科学基金会;
关键词
SpMV; Optimization; GPU; Adaptive; AdELL; Blocking; Compression; Unrolling; Auto-tuning;
D O I
10.1016/j.jpdc.2016.03.011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse linear algebra is fundamental to numerous areas of applied mathematics, science and engineering. In this paper, we propose an efficient data structure named AdELL+ for optimizing the SpMV kernel on GPUs, focusing on performance bottlenecks of sparse computation. The foundation of our work is an ELL-based adaptive format which copes with matrix irregularity using balanced warps composed using a parametrized warp-balancing heuristic. We also address the intrinsic bandwidth-limited nature of SpMV with warp granularity, blocking, delta compression and nonzero unrolling, targeting both memory footprint and memory hierarchy efficiency. Finally, we introduce a novel online auto-tuning approach that uses a quality metric to predict efficient block factors and that hides preprocessing overhead with useful SpMV computation. Our experimental results show that AdELL+ achieves comparable or better performance over other state-of-the-art SpMV sparse formats proposed in academia (BCCOO) and industry (CSR+ and CSR-Adaptive). Moreover, our auto-tuning approach makes AdELL+ viable for real world applications. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:66 / 86
页数:21
相关论文
共 50 条
  • [1] Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs
    Pichel, Juan C.
    Rivera, Francisco F.
    Fernandez, Marcos
    Rodriguez, Aurelio
    MICROPROCESSORS AND MICROSYSTEMS, 2012, 36 (02) : 65 - 77
  • [2] Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs
    Feng, Xiaowen
    Jin, Hai
    Zheng, Ran
    Hu, Kan
    Zeng, Jingxiang
    Shao, Zhiyuan
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 165 - 172
  • [3] A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs
    Guo, Ping
    Wang, Liqiang
    Chen, Po
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (05) : 1112 - 1123
  • [4] Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
    Monakov, Alexander
    Avetisyan, Arutyun
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, PROCEEDINGS, 2009, 5657 : 289 - 297
  • [5] Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
    Tanabe, Noboru
    Ogawa, Yuuka
    Takata, Masami
    Joe, Kazuki
    PROCEEDINGS OF THE 19TH INTERNATIONAL EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2011, : 101 - 108
  • [6] Multiple-precision sparse matrix-vector multiplication on GPUs
    Isupov, Konstantin
    JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 61
  • [7] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Parthasarathy, Srinivasan
    Sadayappan, P.
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
  • [8] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
    Sedaghati, Naser
    Ashari, Arash
    Pouchet, Louis-Noel
    Parthasarathy, Srinivasan
    Sadayappan, P.
    2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24
  • [9] Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA
    Nurudin Alvarez, Francisco
    Antonio Ortega-Toro, Jose
    Ujaldon, Manuel
    HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 219 - 229
  • [10] TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs
    Niu, Yuyao
    Lu, Zhengyang
    Dong, Meichen
    Jin, Zhou
    Liu, Weifeng
    Tan, Guangming
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 68 - 78