Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations

被引：0

作者：

Haonan Ji

Shibo Lu

Kaixi Hou

Hao Wang

Zhou Jin

Weifeng Liu

Brian Vinter

机构：

[1] China University of Petroleum-Beijing,Super Scientific Software Laboratory, Department of Computer Science and Technology

[2] Virginia Tech,Department of Computer Science

[3] The Ohio State University,Department of Computer Science and Engineering

[4] Aarhus University,Faculty of Technical Sciences

来源：

International Journal of Parallel Programming | 2021年 / 49卷

关键词：

Parallel computing; Segmented merge; Sparse matrix; GPU;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Segmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel merging q sub-segments to p segments, both of possibly nonuniform lengths which easily cause the load balancing and the vectorization problems on massively parallel processors, such as GPUs. Our algorithm resolves these problems by first recording the boundaries of segments and sub-segments, then assigning roughly the same number of elements for GPU threads, and finally iteratively merging the sub-segments in each segment in the form of binary tree until there is only one sub-segment in each segment. We implement the segmented merge primitive on GPUs and demonstrate its efficiency on parallel sparse matrix transposition (SpTRANS) and sparse matrix–matrix multiplication (SpGEMM) operations. We conduct a comparative experiment with NVIDIA vendor library on two GPUs. The experimental results show that our algorithm achieve on average 3.94× (up to 13.09×) and 2.89× (up to 109.15×) speedup on SpTRANS and SpGEMM, respectively.

引用

页码：732 / 744

页数：12

共 50 条

[21] FAST PARALLEL MATRIX AND GCD COMPUTATIONS
BORODIN, A
GATHEN, JV
HOPCROFT, J
INFORMATION AND CONTROL, 1982, 52 (03): : 241 - 256
[22] An efficient parallel architecture for matrix computations
Pedram, Ardavan
Daneshtalab, Masoud
Fakhraie, Sied Mehdi
24TH NORCHIP CONFERENCE, PROCEEDINGS, 2006, : 171 - +
[23] Toward an automatic parallelization of sparse matrix computations
Adle, R
Aiguier, M
Delaplace, F
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2005, 65 (03) : 313 - 330
[24] PREDICTING STRUCTURE IN SPARSE-MATRIX COMPUTATIONS
GILBERT, JR
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1994, 15 (01) : 62 - 79
[25] Sparse matrix computations on manycore GPU's
Garland, Michael
2008 45TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2008, : 2 - 6
[26] Sparse matrix computations for dynamic network centrality
Arrigo F.
Higham D.J.
Arrigo, Francesca (francesca.arrigo@strath.ac.uk), 2017, Springer Science and Business Media Deutschland GmbH (02)
[27] ON FINDING SUPERNODES FOR SPARSE-MATRIX COMPUTATIONS
LIU, JWH
NG, EG
PEYTON, BW
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1993, 14 (01) : 242 - 252
[28] Modelling the cache performance of sparse matrix computations
Rauber, T
Scholtes, C
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2271 - 2277
[29] Recursive algorithms of parallel matrix computations
Sukhov, EG
AUTOMATION AND REMOTE CONTROL, 2001, 62 (11) : 1924 - 1929
[30] Parallel algorithms for certain matrix computations
Codenotti, B
Datta, BN
Datta, K
Leoncini, M
THEORETICAL COMPUTER SCIENCE, 1997, 180 (1-2) : 287 - 308

← 1 2 3 4 5 →