Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores

被引：38

作者：

Zachariadis, Orestis ^{[1
]}

Satpute, Nitin ^{[1
]}

Gomez-Luna, Juan ^{[2
]}

Olivares, Joaquin ^{[1
]}

机构：

[1] Univ Cordoba, Dept Elect & Comp Engn, Cordoba, Spain

[2] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2020年 / 88卷 / 88期

基金：

欧盟地平线“2020”;

关键词：

Sparse matrix multiplication; GPU; Tensor Cores; Parallel computing; SpGEMM; MANY-CORE;

D O I：

10.1016/j.compeleceng.2020.106848

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse general matrix-matrix multiplication (spGEMM) is an essential component in many scientific and data analytics applications. However, the sparsity pattern of the input matrices and the interaction of their patterns make spGEMM challenging. Modern GPUs include Tensor Core Units (TCUs), which specialize in dense matrix multiplication. Our aim is to re-purpose TCUs for sparse matrices. The key idea of our spGEMM algorithm, tSparse, is to multiply sparse rectangular blocks using the mixed precision mode of TCUs. tSparse partitions the input matrices into files and operates only on files which contain one or more elements. It creates a task list of the files, and performs matrix multiplication of these files using TCUs. To the best of our knowledge, this is the first time that TCUs are used in the context of spGEMM. We show that spGEMM, with our filing approach, benefits from TCUs. Our approach significantly improves the performance of spGEMM in comparison to cuSPARSE, CUSP, RMerge2, Nsparse, AC-SpGEMM and spECK.

引用

页数：16

共 50 条

[1] Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores
Zhao, Haisha
Li, San
Wang, Jiaheng
Zhou, Chunbao
Wang, Jue
Xin, Zhikuang
Li, Shunde
Liang, Zhiqiang
Pan, Zhijie
Liu, Fang
Zeng, Yan
Wang, Yangang
Chi, Xuebin
PROCEEDINGS OF THE 2025 THE 30TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2025, 2025, : 326 - 338
[2] HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
Li, Zhonggen
Ke, Xiangyu
Zhu, Yifan
Gao, Yunjun
Tu, Yaofeng
arXiv,
[3] Optimizing Sparse Matrix-Matrix Multiplication for the GPU
Dalton, Steven
Olson, Luke
Bell, Nathan
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04):
[4] Adaptive Sparse Matrix-Matrix Multiplication on the GPU
Winter, Martin
Mlakar, Daniel
Zayer, Rhaleb
Seidel, Hans-Peter
Steinberger, Markus
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 68 - 81
[5] spECK: Accelerating GPU Sparse Matrix-Matrix Multiplication through Lightweight Analysis
Parger, Mathias
Winter, Martin
Mlakar, Daniel
Steinberger, Markus
PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 362 - 375
[6] Accelerating Sparse General Matrix-Matrix Multiplication for NVIDIA Volta GPU and Hygon DCU
Tian, Zhuo
Yang, Shuai
Zhang, Changyou
PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 329 - 330
[7] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
Liu, Weifeng
Vinter, Brian
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
[8] Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
Deveci, Mehmet
Trott, Christian
Rajamanickam, Sivasankaran
PARALLEL COMPUTING, 2018, 78 : 33 - 46
[9] GPU-ACCELERATED SPARSE MATRIX-MATRIX MULTIPLICATION BY ITERATIVE ROW MERGING
Gremse, Felix
Hoefter, Andreas
Schwen, Lars Ole
Kiessling, Fabian
Naumann, Uwe
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : C54 - C71
[10] PERFORMANCE EVALUATION OF SPARSE MATRIX-MATRIX MULTIPLICATION
Jain-Mendon, Shweta
Sass, Ron
2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,

← 1 2 3 4 5 →