Characterization of data movement requirements for sparse matrix computations on GPUs

被引:4
|
作者
Kurt, Sureyya Emre [1 ]
Thumma, Vineeth [1 ]
Hong, Changwan [1 ]
Sukumaran-Rajam, Aravind [1 ]
Sadayappan, P. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
data-movement bounds; sparse matrix-vector multiplication (SpMV); sparse matrix-matrix multiplication (SpGEMM); graph analytics; hypergraph partitioning; GPU computing;
D O I
10.1109/HiPC.2017.00040
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Tight data movement lower bounds are known for dense matrix-vector multiplication and dense matrix-matrix multiplication and practical implementations exist on GPUs that achieve performance quite close to the roofline bounds based on operational intensity. For large dense matrices, matrix-vector multiplication is bandwidth-limited and its performance is significantly lower than matrix-matrix multiplication. However, in contrast, the performance of sparse matrix-matrix multiplication (SpGEMM) is generally much lower than that of sparse matrix-vector multiplication (SpMV). In this paper, we use a combination of lower-bounds and upper-bounds analysis of data movement requirements, as well as hardware counter based measurements to gain insights into the performance limitations of existing implementations for SpGEMM on GPUs. The analysis motivates the development of an adaptive work distribution strategy among threads and results in performance enhancement for SpGEMM code on GPUs.
引用
收藏
页码:283 / 293
页数:11
相关论文
共 50 条
  • [31] TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs
    Ji, Haonan
    Song, Huimin
    Lu, Shibo
    Jin, Zhou
    Tan, Guangming
    Liu, Weifeng
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [32] Regularizing Irregularity: Bitmap-based and Portable Sparse Matrix Multiplication for Graph Data on GPUs
    Zhang, Jianting
    Gruenwald, Le
    GRADES-NDA '18: PROCEEDINGS OF THE 1ST ACM SIGMOD JOINT INTERNATIONAL WORKSHOP ON GRAPH DATA MANAGEMENT EXPERIENCES & SYSTEMS (GRADES) AND NETWORK DATA ANALYTICS (NDA) 2018 (GRADES-NDA 2018), 2018,
  • [33] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
    Liu, Weifeng
    Vinter, Brian
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
  • [34] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
    Wei, Bingxin
    Wang, Yizhuo
    Chang, Fangli
    Gao, Jianhua
    Ji, Weixing
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
  • [35] Advancing on an efficient sparse matrix multiplication kernel for modern GPUs
    Berger, Gonzalo
    Freire, Manuel
    Marini, Renzo
    Dufrechou, Ernesto
    Ezzatti, Pablo
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (20):
  • [36] RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
    Brock, Benjamin
    Buluc, Aydin
    Yelick, Katherine
    PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 225 - 235
  • [37] Optimizing Sparse Matrix Operations on GPUs using Merge Path
    Dalton, Steven
    Olson, Luke
    Baxter, Sean
    Merrill, Duane
    Garland, Michael
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 407 - 416
  • [38] Optimization techniques for sparse matrix-vector multiplication on GPUs
    Maggioni, Marco
    Berger-Wolf, Tanya
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
  • [39] A new approach for sparse matrix vector product on NVIDIA GPUs
    Vazquez, F.
    Fernandez, J. J.
    Garzon, E. M.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (08): : 815 - 826
  • [40] On Implementing Sparse Matrix Multi-Vector Multiplication on GPUs
    Abu-Sufah, Walid
    Ahmad, Khalid
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1117 - 1124