Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs

被引:1
|
作者
Ahmad, Khalid [1 ,3 ]
Cecka, Cris [2 ,4 ]
Garland, Michael [2 ,4 ]
Hall, Mary [1 ,3 ]
机构
[1] Univ Utah, Salt Lake City, UT USA
[2] NVIDIA Corp, Santa Clara, CA USA
[3] Univ Utah, Salt Lake City, UT 84108 USA
[4] NVIDIA Corp, Santa Clara, CA 95051 USA
关键词
Sparse tensors; SpMM; data layout;
D O I
10.1145/3633462
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTMis a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we employ a hierarchical tensor data layout that can unfold a multidimensional tensor to derive a 2D matrix, making it possible to compute SpTM using SpMM kernel implementations for GPUs. We compare two SpMM implementations to the state-of-the-art PASTA sparse tensor contraction implementation using: (1) SpMM with hierarchical tensor data layout; and, (2) unfolding followed by an invocation of cuSPARSE's SpMM. Results show that SpMM can outperform PASTA 70.9% of the time, but none of the three approaches is best overall. Therefore, we use a decision tree classifier to identify the best performing sparse tensor contraction kernel based on precomputed properties of the sparse tensor.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] An Input-Adaptive and In-Place Approach to Dense Tensor-Times-Matrix Multiply
    Li, Jiajia
    Battaglino, Casey
    Perros, Ioakeim
    Sun, Jimeng
    Vuduc, Richard
    PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
  • [32] "Wide or Tall" and "Sparse Matrix Dense Matrix" Multiplications
    Howell, Gary W.
    HIGH PERFORMANCE COMPUTING SYMPOSIUM 2011 (HPC 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 6 OF 8, 2011, 43 (02): : 159 - 165
  • [33] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
    Liu, Weifeng
    Vinter, Brian
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
  • [34] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
    Wei, Bingxin
    Wang, Yizhuo
    Chang, Fangli
    Gao, Jianhua
    Ji, Weixing
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
  • [35] Advancing on an efficient sparse matrix multiplication kernel for modern GPUs
    Berger, Gonzalo
    Freire, Manuel
    Marini, Renzo
    Dufrechou, Ernesto
    Ezzatti, Pablo
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (20):
  • [36] RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
    Brock, Benjamin
    Buluc, Aydin
    Yelick, Katherine
    PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 225 - 235
  • [37] Optimizing Sparse Matrix Operations on GPUs using Merge Path
    Dalton, Steven
    Olson, Luke
    Baxter, Sean
    Merrill, Duane
    Garland, Michael
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 407 - 416
  • [38] Optimization techniques for sparse matrix-vector multiplication on GPUs
    Maggioni, Marco
    Berger-Wolf, Tanya
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
  • [39] A new approach for sparse matrix vector product on NVIDIA GPUs
    Vazquez, F.
    Fernandez, J. J.
    Garzon, E. M.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (08): : 815 - 826
  • [40] On Implementing Sparse Matrix Multi-Vector Multiplication on GPUs
    Abu-Sufah, Walid
    Ahmad, Khalid
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1117 - 1124