Design Principles for Sparse Matrix Multiplication on the GPU

被引:58
|
作者
Yang, Carl [1 ,2 ]
Buluc, Aydin [2 ,3 ]
Owens, John D. [1 ,2 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
基金
美国国家科学基金会;
关键词
Sparse matrix multiplication; Parallel; GPU;
D O I
10.1007/978-3-319-96983-1_48
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based load-balancing and (ii) row-major coalesced memory access we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.
引用
收藏
页码:672 / 687
页数:16
相关论文
共 50 条
  • [41] Rgs-SpMM: Accelerate Sparse Matrix-Matrix Multiplication by Row Group Splitting Strategy on the GPU
    Guo, Mingfeng
    Wang, Yaobin
    Huang, Jun
    Wang, Qingfeng
    Zhang, Yaqing
    Xu, Mu
    Lu, Fang
    NETWORK AND PARALLEL COMPUTING, NPC 2022, 2022, 13615 : 61 - 66
  • [42] DeltaSPARSE: High-Performance Sparse General Matrix-Matrix Multiplication on Multi-GPU Systems
    Yang, Shuai
    Zhang, Changyou
    Ma, Ji
    2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 194 - 202
  • [43] Adaptive Sparse Tiling for Sparse Matrix Multiplication
    Hong, Changwan
    Sukumaran-Rajam, Aravind
    Nisa, Israt
    Singh, Kunal
    Sadayappan, P.
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 300 - 314
  • [44] A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication
    Gao, Jiaquan
    Zhou, Yuanshen
    Wu, Kesong
    PARALLEL PROCESSING LETTERS, 2016, 26 (04)
  • [45] A TASK-SCHEDULING APPROACH FOR EFFICIENT SPARSE SYMMETRIC MATRIX-VECTOR MULTIPLICATION ON A GPU
    Mironowicz, P.
    Dziekonski, A.
    Mrozowski, M.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (06): : C643 - C666
  • [46] A New Segmentation-Based GPU-Accelerated Sparse Matrix-Vector Multiplication
    He, Kai
    Tan, Sheldon X-D
    Tlelo-Cuautle, Esteban
    Wang, Hai
    Tang, He
    2014 IEEE 57TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2014, : 1013 - 1016
  • [47] Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores
    Zhao, Haisha
    Li, San
    Wang, Jiaheng
    Zhou, Chunbao
    Wang, Jue
    Xin, Zhikuang
    Li, Shunde
    Liang, Zhiqiang
    Pan, Zhijie
    Liu, Fang
    Zeng, Yan
    Wang, Yangang
    Chi, Xuebin
    PROCEEDINGS OF THE 2025 THE 30TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2025, 2025, : 326 - 338
  • [48] High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU
    Nagasaka, Yusuke
    Nukada, Akira
    Matsuoka, Satoshi
    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 101 - 110
  • [49] Sparse Matrix Sparse Vector Multiplication - A Novel Approach
    Shah, Monika
    2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2015, : 67 - 73
  • [50] Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU
    刘力
    LiuLi
    Yang Guang wen
    HighTechnologyLetters, 2013, 19 (04) : 339 - 345