Design Principles for Sparse Matrix Multiplication on the GPU

被引：58

作者：

Yang, Carl ^{[1
,2
]}

Buluc, Aydin ^{[2
,3
]}

Owens, John D. ^{[1
,2
]}

机构：

[1] Univ Calif Davis, Davis, CA 95616 USA

[2] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA

[3] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

EURO-PAR 2018: PARALLEL PROCESSING | 2018年 / 11014卷

基金：

美国国家科学基金会;

关键词：

Sparse matrix multiplication; Parallel; GPU;

D O I：

10.1007/978-3-319-96983-1_48

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based load-balancing and (ii) row-major coalesced memory access we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.

引用

页码：672 / 687

页数：16

共 50 条

[21] spECK: Accelerating GPU Sparse Matrix-Matrix Multiplication through Lightweight Analysis
Parger, Mathias
Winter, Martin
Mlakar, Daniel
Steinberger, Markus
PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 362 - 375
[22] Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
Deveci, Mehmet
Trott, Christian
Rajamanickam, Sivasankaran
PARALLEL COMPUTING, 2018, 78 : 33 - 46
[23] Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU
Yuan Tao
Huang Zhi-Bin
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (10): : 99 - 106
[24] A hybrid format for better performance of sparse matrix-vector multiplication on a GPU
Guo, Dahai
Gropp, William
Olson, Luke N.
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2016, 30 (01): : 103 - 120
[25] Accelerating Sparse General Matrix-Matrix Multiplication for NVIDIA Volta GPU and Hygon DCU
Tian, Zhuo
Yang, Shuai
Zhang, Changyou
PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 329 - 330
[26] SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPU
Cui, Huanyu
Wang, Nianbin
Han, Qilong
Wang, Ye
PARALLEL PROCESSING LETTERS, 2024, 34 (02)
[27] Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU
Gao, Jiaquan
Qi, Panpan
He, Guixia
MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
[28] Sparse matrix-vector multiplication design on FPGAs
Sun, Junqing
Peterson, Gregory
Storaasli, Olaf
FCCM 2007: 15TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2007, : 349 - +
[29] Sparse matrix multiplication
Briggs, P
ACM SIGPLAN NOTICES, 1996, 31 (11) : 33 - 37
[30] Design space exploration for sparse matrix-matrix multiplication on FPGAs
Lin, Colin Yu
Wong, Ngai
So, Hayden Kwok-Hay
INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2013, 41 (02) : 205 - 219

← 1 2 3 4 5 →