Design Principles for Sparse Matrix Multiplication on the GPU

被引：58

作者：

Yang, Carl ^{[1
,2
]}

Buluc, Aydin ^{[2
,3
]}

Owens, John D. ^{[1
,2
]}

机构：

[1] Univ Calif Davis, Davis, CA 95616 USA

[2] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA

[3] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

EURO-PAR 2018: PARALLEL PROCESSING | 2018年 / 11014卷

基金：

美国国家科学基金会;

关键词：

Sparse matrix multiplication; Parallel; GPU;

D O I：

10.1007/978-3-319-96983-1_48

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients-(i) merge-based load-balancing and (ii) row-major coalesced memory access we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.

引用

页码：672 / 687

页数：16

共 50 条

[31] HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
Li, Zhonggen
Ke, Xiangyu
Zhu, Yifan
Gao, Yunjun
Tu, Yaofeng
arXiv,
[32] Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster
Maeda, Hiroshi
Takahashi, Daisuke
PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 396 - 399
[33] Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU
Kubota, Yuji
Takahashi, Daisuke
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2011, PT II, 2011, 6783 : 547 - 561
[34] Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU
Nagasaka, Yusuke
Nukada, Akira
Matsuoka, Satoshi
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 131 - 142
[35] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
Gao, Jianhua
Ji, Weixing
Tan, Zhaonian
Wang, Yizhuo
Shi, Feng
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745
[36] Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication
Nguyen Quang Anh Pham
Fan, Rui
Wen, Yonggang
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 1043 - 1052
[37] An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU
Xing, Longyue
Wang, Zhaoshun
Ding, Zhezhao
Chu, Genshen
Dong, Lingyu
Xiao, Nan
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (23):
[38] Coded Sparse Matrix Multiplication
Wang, Sinong
Liu, Jiashang
Shroff, Ness
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[39] Fast Sparse Matrix Multiplication
Yuster, Raphael
Zwick, Uri
ACM TRANSACTIONS ON ALGORITHMS, 2005, 1 (01) : 2 - 13
[40] Fast sparse matrix multiplication
Yuster, R
Zwick, U
ALGORITHMS ESA 2004, PROCEEDINGS, 2004, 3221 : 604 - 615

← 1 2 3 4 5 →