Design Principles for Sparse Matrix Multiplication on the GPU

被引:58
|
作者
Yang, Carl [1 ,2 ]
Buluc, Aydin [2 ,3 ]
Owens, John D. [1 ,2 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
基金
美国国家科学基金会;
关键词
Sparse matrix multiplication; Parallel; GPU;
D O I
10.1007/978-3-319-96983-1_48
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based load-balancing and (ii) row-major coalesced memory access we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.
引用
收藏
页码:672 / 687
页数:16
相关论文
共 50 条
  • [21] spECK: Accelerating GPU Sparse Matrix-Matrix Multiplication through Lightweight Analysis
    Parger, Mathias
    Winter, Martin
    Mlakar, Daniel
    Steinberger, Markus
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 362 - 375
  • [22] Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
    Deveci, Mehmet
    Trott, Christian
    Rajamanickam, Sivasankaran
    PARALLEL COMPUTING, 2018, 78 : 33 - 46
  • [23] Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU
    Yuan Tao
    Huang Zhi-Bin
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (10): : 99 - 106
  • [24] A hybrid format for better performance of sparse matrix-vector multiplication on a GPU
    Guo, Dahai
    Gropp, William
    Olson, Luke N.
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2016, 30 (01): : 103 - 120
  • [25] Accelerating Sparse General Matrix-Matrix Multiplication for NVIDIA Volta GPU and Hygon DCU
    Tian, Zhuo
    Yang, Shuai
    Zhang, Changyou
    PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 329 - 330
  • [26] SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPU
    Cui, Huanyu
    Wang, Nianbin
    Han, Qilong
    Wang, Ye
    PARALLEL PROCESSING LETTERS, 2024, 34 (02)
  • [27] Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU
    Gao, Jiaquan
    Qi, Panpan
    He, Guixia
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [28] Sparse matrix-vector multiplication design on FPGAs
    Sun, Junqing
    Peterson, Gregory
    Storaasli, Olaf
    FCCM 2007: 15TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2007, : 349 - +
  • [29] Sparse matrix multiplication
    Briggs, P
    ACM SIGPLAN NOTICES, 1996, 31 (11) : 33 - 37
  • [30] Design space exploration for sparse matrix-matrix multiplication on FPGAs
    Lin, Colin Yu
    Wong, Ngai
    So, Hayden Kwok-Hay
    INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2013, 41 (02) : 205 - 219