A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

被引：17

作者：

Lin, Shaozhong ^{[1
,2
]}

Xie, Zhiqiang ^{[1
,2
]}

机构：

[1] Changjiang River Sci Res Inst, Wuhan 430010, Peoples R China

[2] Res Ctr Water Engn Safety & Disaster Prevent MWR, Wuhan 430010, Peoples R China

来源：

JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期

基金：

中国国家自然科学基金;

关键词：

JPCG; Sparse linear systems; Multi-GPU cluster; Communication reduction; Node reordering; Counting sort; Computation/communication overlapping;

D O I：

10.1007/s11227-016-1887-4

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The General Purpose Graphics Processing Unit (GPGPU or GPU) has powerful float-point computation ability and is suitable for intensive computing, such as solving large linear systems. The Jacobi Preconditioned Conjugate Gradient method (Jacobi_PCG or JPCG), one type of preconditioned iteration methods for the numerical solution of large sparse linear systems, has advantages of high parallelism and is especially appropriate for implementation on GPUs. On multi-GPU cluster, the matrix-vector multiplication involved in the PCG iteration needs the vector entries generated by current GPU and other GPUs, so the communication between GPUs becomes a major performance bottleneck. In this paper, we study the implementation of the JPCG on multi-GPU cluster. Considering the coarse-grained parallelism between GPUs and the sparsity of matrices arising from the finite element method (FEM), a simple and fast node reordering method is presented to optimize the bandwidth of sparse matrices, resulting in a reduction of the communication between GPUs. This novel reordering method is based on integerized nodal coordinates of FEM mesh and the counting sort algorithm. Additionally, computation and communication are overlapped using CUDA asynchronous memory transfer and MPI_sendrecv communication to further reduce the communication cost. A JPCG solver on multi-GPU cluster is developed using CUDA Fortran. Tests show that this solver has high efficiency and strong scalability.

引用

页码：433 / 454

页数：22

共 50 条

[21] SparSol: sparse linear systems solver
Diyankov, O. V.
Koshelev, S. V.
Kotegov, S. S.
Krasnogorov, I. V.
Kuznetsova, N. N.
Pravilnikov, V. Y.
Beckner, B. L.
Maliassov, S. Y.
Mishev, I. D.
Usadi, A. K.
RUSSIAN JOURNAL OF NUMERICAL ANALYSIS AND MATHEMATICAL MODELLING, 2007, 22 (04) : 325 - 339
[22] Strength Check of Aircraft Parts Based on Multi-GPU Clusters for Fast Calculation of Sparse Linear Equations
Zhang, Yuhua
Hu, Binxing
IEEE ACCESS, 2020, 8 : 77188 - 77203
[23] New Generation of WIPL-D In-Core Multi-GPU Solver
Mrdakovic, Branko Lj.
Kostic, Milan M.
Olcan, Dragan I.
Kolundzija, Branko M.
2018 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION & USNC/URSI NATIONAL RADIO SCIENCE MEETING, 2018, : 413 - 414
[24] Solver of Multi-GPU Compressible Turbulence Parallel Simulations Used in Aerodynamic Teaching
Luo Kai
Cao Wenbin
Li Song
Song Limin
PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 707 - 710
[25] Solution of Large Sparse System of Linear Equations over GF(2) on a Multi-Node Multi-GPU Platform
Rawal, Shruti
Gupta, Indivar
DEFENCE SCIENCE JOURNAL, 2022, 72 (06) : 836 - 845
[26] A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
Ament, M.
Knittel, G.
Weiskopf, D.
Strasser, W.
PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, : 583 - 592
[27] Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster
Xian, Wang
Takayuki, Aoki
PARALLEL COMPUTING, 2011, 37 (09) : 521 - 535
[28] Simulating cortical networks on heterogeneous multi-GPU systems
Nere, Andrew
Franey, Sean
Hashmi, Atif
Lipasti, Mikko
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (07) : 953 - 971
[29] Efficient Solving of Scan Primitive on Multi-GPU Systems
Dieguez, Adrian P.
Amor, Margarita
Doallo, Ramon
Nukada, Akira
Matsuoka, Satoshi
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 794 - 803
[30] Linear Solver Performance in Elastoplastic Problem Solution on GPU Cluster
Khalevitsky, Yu. V.
Konovalov, A. V.
Burmasheva, N. V.
Partin, A. S.
MECHANICS, RESOURCE AND DIAGNOSTICS OF MATERIALS AND STRUCTURES (MRDMS-2017), 2017, 1915

← 1 2 3 4 5 →