Characterization of data movement requirements for sparse matrix computations on GPUs

被引:4
|
作者
Kurt, Sureyya Emre [1 ]
Thumma, Vineeth [1 ]
Hong, Changwan [1 ]
Sukumaran-Rajam, Aravind [1 ]
Sadayappan, P. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
data-movement bounds; sparse matrix-vector multiplication (SpMV); sparse matrix-matrix multiplication (SpGEMM); graph analytics; hypergraph partitioning; GPU computing;
D O I
10.1109/HiPC.2017.00040
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Tight data movement lower bounds are known for dense matrix-vector multiplication and dense matrix-matrix multiplication and practical implementations exist on GPUs that achieve performance quite close to the roofline bounds based on operational intensity. For large dense matrices, matrix-vector multiplication is bandwidth-limited and its performance is significantly lower than matrix-matrix multiplication. However, in contrast, the performance of sparse matrix-matrix multiplication (SpGEMM) is generally much lower than that of sparse matrix-vector multiplication (SpMV). In this paper, we use a combination of lower-bounds and upper-bounds analysis of data movement requirements, as well as hardware counter based measurements to gain insights into the performance limitations of existing implementations for SpGEMM on GPUs. The analysis motivates the development of an adaptive work distribution strategy among threads and results in performance enhancement for SpGEMM code on GPUs.
引用
收藏
页码:283 / 293
页数:11
相关论文
共 50 条
  • [1] Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers
    Abdelfattah, Ahmad
    Ghysels, Pieter
    Boukaram, Wajih
    Tomov, Stanimire
    Li, Xiaoye Sherry
    Dongarra, Jack
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [2] Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs
    Ahmad, Khalid
    Cecka, Cris
    Garland, Michael
    Hall, Mary
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (01)
  • [3] Redesigning Triangular Dense Matrix Computations on GPUs
    Charara, Ali
    Ltaief, Hatem
    Keyes, David
    EURO-PAR 2016: PARALLEL PROCESSING, 2016, 9833 : 477 - 489
  • [4] Automatic data structure selection and transformation for sparse matrix computations
    Bik, AJC
    Wijshoff, HAG
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (02) : 109 - 126
  • [5] SPARSE-MATRIX COMPUTATIONS
    REID, JK
    APPLICATIONS OF MATRIX THEORY, 1989, 22 : 101 - 121
  • [6] A Novel Data Transformation and Execution Strategy for Accelerating Sparse Matrix Multiplication on GPUs
    Jiang, Peng
    Hong, Changwan
    Agrawal, Gagan
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 376 - 388
  • [7] Batched matrix computations on hardware accelerators based on GPUs
    Haidar, Azzam
    Dong, Tingxing
    Luszczek, Piotr
    Tomov, Stanimire
    Dongarra, Jack
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2015, 29 (02): : 193 - 208
  • [8] Data-driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs
    Ahmad, Khalid
    Sundar, Hari
    Hall, Mary
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (04)
  • [9] Exploiting the capabilities of modern GPUs for dense matrix computations
    Barrachina, Sergio
    Castillo, Maribel
    Igual, Francisco D.
    Mayo, Rafael
    Quintana-Orti, Enrique S.
    Quintana-Orti, Gregorio
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (18): : 2457 - 2477
  • [10] Automatic Selection of Sparse Matrix Representation on GPUs
    Sedaghati, Naser
    Mu, Te
    Pouchet, Louis-Noel
    Parthasarathy, Srinivasan
    Sadayappan, P.
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 99 - 108