Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs

被引:1
|
作者
Ahmad, Khalid [1 ,3 ]
Cecka, Cris [2 ,4 ]
Garland, Michael [2 ,4 ]
Hall, Mary [1 ,3 ]
机构
[1] Univ Utah, Salt Lake City, UT USA
[2] NVIDIA Corp, Santa Clara, CA USA
[3] Univ Utah, Salt Lake City, UT 84108 USA
[4] NVIDIA Corp, Santa Clara, CA 95051 USA
关键词
Sparse tensors; SpMM; data layout;
D O I
10.1145/3633462
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTMis a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we employ a hierarchical tensor data layout that can unfold a multidimensional tensor to derive a 2D matrix, making it possible to compute SpTM using SpMM kernel implementations for GPUs. We compare two SpMM implementations to the state-of-the-art PASTA sparse tensor contraction implementation using: (1) SpMM with hierarchical tensor data layout; and, (2) unfolding followed by an invocation of cuSPARSE's SpMM. Results show that SpMM can outperform PASTA 70.9% of the time, but none of the three approaches is best overall. Therefore, we use a decision tree classifier to identify the best performing sparse tensor contraction kernel based on precomputed properties of the sparse tensor.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Optimizing sparse tensor times matrix on GPUs
    Ma, Yuchen
    Li, Jiajia
    Wu, Xiaolong
    Yan, Chenggang
    Sun, Jimeng
    Vuduc, Richard
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 129 : 99 - 109
  • [2] Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA
    Nurudin Alvarez, Francisco
    Antonio Ortega-Toro, Jose
    Ujaldon, Manuel
    HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 219 - 229
  • [3] Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs
    Aliaga, Jose Ignacio
    Anzt, Hartwig
    Quintana-Orti, Enrique S.
    Tomas, Andres E.
    Tsai, Yuhsiang M.
    EURO-PAR 2020: PARALLEL PROCESSING WORKSHOPS, 2021, 12480 : 83 - 95
  • [4] Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2020, : 19 - 26
  • [5] TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs
    Wang, Yuke
    Feng, Boyuan
    Wang, Zheng
    Huang, Guyue
    Ding, Yufei
    PROCEEDINGS OF THE 2023 USENIX ANNUAL TECHNICAL CONFERENCE, 2023, : 149 - 164
  • [6] Characterization of data movement requirements for sparse matrix computations on GPUs
    Kurt, Sureyya Emre
    Thumma, Vineeth
    Hong, Changwan
    Sukumaran-Rajam, Aravind
    Sadayappan, P.
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 283 - 293
  • [7] A Sparse Tensor Benchmark Suite for CPUs and GPUs
    Li, Jiajia
    Lakshminarasimhan, Mahesh
    Wu, Xiaolong
    Li, Ang
    Olschanowsky, Catherine
    Barker, Kevin
    2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 193 - 204
  • [8] Automatic Data Layout Optimizations for GPUs
    Kofler, Klaus
    Cosenza, Biagio
    Fahringer, Thomas
    EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 263 - 274
  • [9] A Unified Optimization Approach for Sparse Tensor Operations on GPUs
    Liu, Bangtian
    Wen, Chengyao
    Sarwate, Anand D.
    Dehnavi, Maryam Mehri
    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 47 - 57
  • [10] A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs
    Li, Jiajia
    Lakshminarasimhan, Mahesh
    Wu, Xiaolong
    Li, Ang
    Olschanowsky, Catherine
    Barker, Kevin
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 403 - 404