Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores

被引:38
|
作者
Zachariadis, Orestis [1 ]
Satpute, Nitin [1 ]
Gomez-Luna, Juan [2 ]
Olivares, Joaquin [1 ]
机构
[1] Univ Cordoba, Dept Elect & Comp Engn, Cordoba, Spain
[2] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
基金
欧盟地平线“2020”;
关键词
Sparse matrix multiplication; GPU; Tensor Cores; Parallel computing; SpGEMM; MANY-CORE;
D O I
10.1016/j.compeleceng.2020.106848
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse general matrix-matrix multiplication (spGEMM) is an essential component in many scientific and data analytics applications. However, the sparsity pattern of the input matrices and the interaction of their patterns make spGEMM challenging. Modern GPUs include Tensor Core Units (TCUs), which specialize in dense matrix multiplication. Our aim is to re-purpose TCUs for sparse matrices. The key idea of our spGEMM algorithm, tSparse, is to multiply sparse rectangular blocks using the mixed precision mode of TCUs. tSparse partitions the input matrices into files and operates only on files which contain one or more elements. It creates a task list of the files, and performs matrix multiplication of these files using TCUs. To the best of our knowledge, this is the first time that TCUs are used in the context of spGEMM. We show that spGEMM, with our filing approach, benefits from TCUs. Our approach significantly improves the performance of spGEMM in comparison to cuSPARSE, CUSP, RMerge2, Nsparse, AC-SpGEMM and spECK.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores
    Zhao, Haisha
    Li, San
    Wang, Jiaheng
    Zhou, Chunbao
    Wang, Jue
    Xin, Zhikuang
    Li, Shunde
    Liang, Zhiqiang
    Pan, Zhijie
    Liu, Fang
    Zeng, Yan
    Wang, Yangang
    Chi, Xuebin
    PROCEEDINGS OF THE 2025 THE 30TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2025, 2025, : 326 - 338
  • [2] HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
    Li, Zhonggen
    Ke, Xiangyu
    Zhu, Yifan
    Gao, Yunjun
    Tu, Yaofeng
    arXiv,
  • [3] Optimizing Sparse Matrix-Matrix Multiplication for the GPU
    Dalton, Steven
    Olson, Luke
    Bell, Nathan
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04):
  • [4] Adaptive Sparse Matrix-Matrix Multiplication on the GPU
    Winter, Martin
    Mlakar, Daniel
    Zayer, Rhaleb
    Seidel, Hans-Peter
    Steinberger, Markus
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 68 - 81
  • [5] spECK: Accelerating GPU Sparse Matrix-Matrix Multiplication through Lightweight Analysis
    Parger, Mathias
    Winter, Martin
    Mlakar, Daniel
    Steinberger, Markus
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 362 - 375
  • [6] Accelerating Sparse General Matrix-Matrix Multiplication for NVIDIA Volta GPU and Hygon DCU
    Tian, Zhuo
    Yang, Shuai
    Zhang, Changyou
    PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 329 - 330
  • [7] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
    Liu, Weifeng
    Vinter, Brian
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [8] Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
    Deveci, Mehmet
    Trott, Christian
    Rajamanickam, Sivasankaran
    PARALLEL COMPUTING, 2018, 78 : 33 - 46
  • [9] GPU-ACCELERATED SPARSE MATRIX-MATRIX MULTIPLICATION BY ITERATIVE ROW MERGING
    Gremse, Felix
    Hoefter, Andreas
    Schwen, Lars Ole
    Kiessling, Fabian
    Naumann, Uwe
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : C54 - C71
  • [10] PERFORMANCE EVALUATION OF SPARSE MATRIX-MATRIX MULTIPLICATION
    Jain-Mendon, Shweta
    Sass, Ron
    2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,