Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores

被引:38
|
作者
Zachariadis, Orestis [1 ]
Satpute, Nitin [1 ]
Gomez-Luna, Juan [2 ]
Olivares, Joaquin [1 ]
机构
[1] Univ Cordoba, Dept Elect & Comp Engn, Cordoba, Spain
[2] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
基金
欧盟地平线“2020”;
关键词
Sparse matrix multiplication; GPU; Tensor Cores; Parallel computing; SpGEMM; MANY-CORE;
D O I
10.1016/j.compeleceng.2020.106848
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse general matrix-matrix multiplication (spGEMM) is an essential component in many scientific and data analytics applications. However, the sparsity pattern of the input matrices and the interaction of their patterns make spGEMM challenging. Modern GPUs include Tensor Core Units (TCUs), which specialize in dense matrix multiplication. Our aim is to re-purpose TCUs for sparse matrices. The key idea of our spGEMM algorithm, tSparse, is to multiply sparse rectangular blocks using the mixed precision mode of TCUs. tSparse partitions the input matrices into files and operates only on files which contain one or more elements. It creates a task list of the files, and performs matrix multiplication of these files using TCUs. To the best of our knowledge, this is the first time that TCUs are used in the context of spGEMM. We show that spGEMM, with our filing approach, benefits from TCUs. Our approach significantly improves the performance of spGEMM in comparison to cuSPARSE, CUSP, RMerge2, Nsparse, AC-SpGEMM and spECK.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Register-based Implementation of the Sparse General Matrix-Matrix Multiplication on GPUs
    Liu, Junhong
    He, Xin
    Liu, Weifeng
    Tan, Guangming
    ACM SIGPLAN NOTICES, 2018, 53 (01) : 407 - 408
  • [42] Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures
    Akbudak, Kadir
    Aykanat, Cevdet
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (08) : 2258 - 2271
  • [43] Tensor Core-Adapted Sparse Matrix Multiplication for Accelerating Sparse Deep Neural Networks
    Han, Yoonsang
    Kim, Inseo
    Kim, Jinsung
    Moon, Gordon Euhyun
    ELECTRONICS, 2024, 13 (20)
  • [44] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
    Niu, Yuyao
    Lu, Zhengyang
    Ji, Haonan
    Song, Shuhui
    Jin, Zhou
    Liu, Weifeng
    PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
  • [45] Matrix-matrix multiplication on heterogeneous platforms
    Beaumont, O
    Boudet, V
    Rastello, F
    Robert, Y
    2000 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 289 - 298
  • [46] Design Principles for Sparse Matrix Multiplication on the GPU
    Yang, Carl
    Buluc, Aydin
    Owens, John D.
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 672 - 687
  • [47] A GPU Framework for Sparse Matrix Vector Multiplication
    Neelima, B.
    Reddy, G. Ram Mohana
    Raghavendra, Prakash S.
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 51 - 58
  • [48] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
    Tao, Yuan
    Deng, Yangdong
    Mu, Shuai
    Zhang, Zhenzhong
    Zhu, Mingfa
    Xiao, Limin
    Ruan, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
  • [49] Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2020, : 19 - 26
  • [50] An Efficient Gustavson-Based Sparse Matrix-Matrix Multiplication Accelerator on Embedded FPGAs
    Li, Shiqing
    Huai, Shuo
    Liu, Weichen
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4671 - 4680