Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores

被引：38

作者：

Zachariadis, Orestis ^{[1
]}

Satpute, Nitin ^{[1
]}

Gomez-Luna, Juan ^{[2
]}

Olivares, Joaquin ^{[1
]}

机构：

[1] Univ Cordoba, Dept Elect & Comp Engn, Cordoba, Spain

[2] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2020年 / 88卷 / 88期

基金：

欧盟地平线“2020”;

关键词：

Sparse matrix multiplication; GPU; Tensor Cores; Parallel computing; SpGEMM; MANY-CORE;

D O I：

10.1016/j.compeleceng.2020.106848

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse general matrix-matrix multiplication (spGEMM) is an essential component in many scientific and data analytics applications. However, the sparsity pattern of the input matrices and the interaction of their patterns make spGEMM challenging. Modern GPUs include Tensor Core Units (TCUs), which specialize in dense matrix multiplication. Our aim is to re-purpose TCUs for sparse matrices. The key idea of our spGEMM algorithm, tSparse, is to multiply sparse rectangular blocks using the mixed precision mode of TCUs. tSparse partitions the input matrices into files and operates only on files which contain one or more elements. It creates a task list of the files, and performs matrix multiplication of these files using TCUs. To the best of our knowledge, this is the first time that TCUs are used in the context of spGEMM. We show that spGEMM, with our filing approach, benefits from TCUs. Our approach significantly improves the performance of spGEMM in comparison to cuSPARSE, CUSP, RMerge2, Nsparse, AC-SpGEMM and spECK.

引用

页数：16

共 50 条

[41] Register-based Implementation of the Sparse General Matrix-Matrix Multiplication on GPUs
Liu, Junhong
He, Xin
Liu, Weifeng
Tan, Guangming
ACM SIGPLAN NOTICES, 2018, 53 (01) : 407 - 408
[42] Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures
Akbudak, Kadir
Aykanat, Cevdet
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (08) : 2258 - 2271
[43] Tensor Core-Adapted Sparse Matrix Multiplication for Accelerating Sparse Deep Neural Networks
Han, Yoonsang
Kim, Inseo
Kim, Jinsung
Moon, Gordon Euhyun
ELECTRONICS, 2024, 13 (20)
[44] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
Niu, Yuyao
Lu, Zhengyang
Ji, Haonan
Song, Shuhui
Jin, Zhou
Liu, Weifeng
PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
[45] Matrix-matrix multiplication on heterogeneous platforms
Beaumont, O
Boudet, V
Rastello, F
Robert, Y
2000 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 289 - 298
[46] Design Principles for Sparse Matrix Multiplication on the GPU
Yang, Carl
Buluc, Aydin
Owens, John D.
EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 672 - 687
[47] A GPU Framework for Sparse Matrix Vector Multiplication
Neelima, B.
Reddy, G. Ram Mohana
Raghavendra, Prakash S.
2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 51 - 58
[48] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
Tao, Yuan
Deng, Yangdong
Mu, Shuai
Zhang, Zhenzhong
Zhu, Mingfa
Xiao, Limin
Ruan, Li
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
[49] Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format
Shi, Shaohuai
Wang, Qiang
Chu, Xiaowen
2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2020, : 19 - 26
[50] An Efficient Gustavson-Based Sparse Matrix-Matrix Multiplication Accelerator on Embedded FPGAs
Li, Shiqing
Huai, Shuo
Liu, Weichen
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4671 - 4680

← 1 2 3 4 5 →