Design Principles for Sparse Matrix Multiplication on the GPU

被引：58

作者：

Yang, Carl ^{[1
,2
]}

Buluc, Aydin ^{[2
,3
]}

Owens, John D. ^{[1
,2
]}

机构：

[1] Univ Calif Davis, Davis, CA 95616 USA

[2] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA

[3] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

EURO-PAR 2018: PARALLEL PROCESSING | 2018年 / 11014卷

基金：

美国国家科学基金会;

关键词：

Sparse matrix multiplication; Parallel; GPU;

D O I：

10.1007/978-3-319-96983-1_48

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based load-balancing and (ii) row-major coalesced memory access we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.

引用

页码：672 / 687

页数：16

共 50 条

[1] An Efficient Sparse Matrix Multiplication for skewed matrix on GPU
Shah, Monika
Patel, Vibha
2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1301 - 1306
[2] Adaptive Sparse Matrix-Matrix Multiplication on the GPU
Winter, Martin
Mlakar, Daniel
Zayer, Rhaleb
Seidel, Hans-Peter
Steinberger, Markus
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 68 - 81
[3] Optimizing Sparse Matrix-Matrix Multiplication for the GPU
Dalton, Steven
Olson, Luke
Bell, Nathan
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2015, 41 (04):
[4] A GPU Framework for Sparse Matrix Vector Multiplication
Neelima, B.
Reddy, G. Ram Mohana
Raghavendra, Prakash S.
2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 51 - 58
[5] Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU
Yang, Carl
Wang, Yangzihao
Owens, John D.
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 841 - 847
[6] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
Tao, Yuan
Deng, Yangdong
Mu, Shuai
Zhang, Zhenzhong
Zhu, Mingfa
Xiao, Limin
Ruan, Li
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
[7] Sparse Matrix Assembly on the GPU Through Multiplication Patterns
Zayer, Rhaleb
Steinberger, Markus
Seidel, Hans-Peter
2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,
[8] Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores
Zachariadis, Orestis
Satpute, Nitin
Gomez-Luna, Juan
Olivares, Joaquin
COMPUTERS & ELECTRICAL ENGINEERING, 2020, 88 (88)
[9] Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks
Lee, Jeongmyung
Kang, Seokwon
Yu, Yongseung
Jo, Yong-Yeon
Kim, Sang-Wook
Park, Yongjun
2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 925 - 936
[10] Adaptive diagonal sparse matrix-vector multiplication on GPU
Gao, Jiaquan
Xia, Yifei
Yin, Renjie
He, Guixia
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 157 : 287 - 302

← 1 2 3 4 5 →