Design Principles for Sparse Matrix Multiplication on the GPU

被引:58
|
作者
Yang, Carl [1 ,2 ]
Buluc, Aydin [2 ,3 ]
Owens, John D. [1 ,2 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
基金
美国国家科学基金会;
关键词
Sparse matrix multiplication; Parallel; GPU;
D O I
10.1007/978-3-319-96983-1_48
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based load-balancing and (ii) row-major coalesced memory access we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.
引用
收藏
页码:672 / 687
页数:16
相关论文
共 50 条
  • [1] An Efficient Sparse Matrix Multiplication for skewed matrix on GPU
    Shah, Monika
    Patel, Vibha
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1301 - 1306
  • [2] Adaptive Sparse Matrix-Matrix Multiplication on the GPU
    Winter, Martin
    Mlakar, Daniel
    Zayer, Rhaleb
    Seidel, Hans-Peter
    Steinberger, Markus
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 68 - 81
  • [3] Optimizing Sparse Matrix-Matrix Multiplication for the GPU
    Dalton, Steven
    Olson, Luke
    Bell, Nathan
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04):
  • [4] A GPU Framework for Sparse Matrix Vector Multiplication
    Neelima, B.
    Reddy, G. Ram Mohana
    Raghavendra, Prakash S.
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 51 - 58
  • [5] Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU
    Yang, Carl
    Wang, Yangzihao
    Owens, John D.
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 841 - 847
  • [6] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
    Tao, Yuan
    Deng, Yangdong
    Mu, Shuai
    Zhang, Zhenzhong
    Zhu, Mingfa
    Xiao, Limin
    Ruan, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
  • [7] Sparse Matrix Assembly on the GPU Through Multiplication Patterns
    Zayer, Rhaleb
    Steinberger, Markus
    Seidel, Hans-Peter
    2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,
  • [8] Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores
    Zachariadis, Orestis
    Satpute, Nitin
    Gomez-Luna, Juan
    Olivares, Joaquin
    COMPUTERS & ELECTRICAL ENGINEERING, 2020, 88 (88)
  • [9] Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks
    Lee, Jeongmyung
    Kang, Seokwon
    Yu, Yongseung
    Jo, Yong-Yeon
    Kim, Sang-Wook
    Park, Yongjun
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 925 - 936
  • [10] Adaptive diagonal sparse matrix-vector multiplication on GPU
    Gao, Jiaquan
    Xia, Yifei
    Yin, Renjie
    He, Guixia
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 157 : 287 - 302