Design Principles for Sparse Matrix Multiplication on the GPU

被引:58
|
作者
Yang, Carl [1 ,2 ]
Buluc, Aydin [2 ,3 ]
Owens, John D. [1 ,2 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
基金
美国国家科学基金会;
关键词
Sparse matrix multiplication; Parallel; GPU;
D O I
10.1007/978-3-319-96983-1_48
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based load-balancing and (ii) row-major coalesced memory access we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.
引用
收藏
页码:672 / 687
页数:16
相关论文
共 50 条
  • [31] HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
    Li, Zhonggen
    Ke, Xiangyu
    Zhu, Yifan
    Gao, Yunjun
    Tu, Yaofeng
    arXiv,
  • [32] Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster
    Maeda, Hiroshi
    Takahashi, Daisuke
    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 396 - 399
  • [33] Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU
    Kubota, Yuji
    Takahashi, Daisuke
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2011, PT II, 2011, 6783 : 547 - 561
  • [34] Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU
    Nagasaka, Yusuke
    Nukada, Akira
    Matsuoka, Satoshi
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 131 - 142
  • [35] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
    Gao, Jianhua
    Ji, Weixing
    Tan, Zhaonian
    Wang, Yizhuo
    Shi, Feng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745
  • [36] Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication
    Nguyen Quang Anh Pham
    Fan, Rui
    Wen, Yonggang
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 1043 - 1052
  • [37] An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU
    Xing, Longyue
    Wang, Zhaoshun
    Ding, Zhezhao
    Chu, Genshen
    Dong, Lingyu
    Xiao, Nan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (23):
  • [38] Coded Sparse Matrix Multiplication
    Wang, Sinong
    Liu, Jiashang
    Shroff, Ness
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [39] Fast Sparse Matrix Multiplication
    Yuster, Raphael
    Zwick, Uri
    ACM TRANSACTIONS ON ALGORITHMS, 2005, 1 (01) : 2 - 13
  • [40] Fast sparse matrix multiplication
    Yuster, R
    Zwick, U
    ALGORITHMS ESA 2004, PROCEEDINGS, 2004, 3221 : 604 - 615