A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

被引：17

作者：

Lin, Shaozhong ^{[1
,2
]}

Xie, Zhiqiang ^{[1
,2
]}

机构：

[1] Changjiang River Sci Res Inst, Wuhan 430010, Peoples R China

[2] Res Ctr Water Engn Safety & Disaster Prevent MWR, Wuhan 430010, Peoples R China

来源：

JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期

基金：

中国国家自然科学基金;

关键词：

JPCG; Sparse linear systems; Multi-GPU cluster; Communication reduction; Node reordering; Counting sort; Computation/communication overlapping;

D O I：

10.1007/s11227-016-1887-4

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The General Purpose Graphics Processing Unit (GPGPU or GPU) has powerful float-point computation ability and is suitable for intensive computing, such as solving large linear systems. The Jacobi Preconditioned Conjugate Gradient method (Jacobi_PCG or JPCG), one type of preconditioned iteration methods for the numerical solution of large sparse linear systems, has advantages of high parallelism and is especially appropriate for implementation on GPUs. On multi-GPU cluster, the matrix-vector multiplication involved in the PCG iteration needs the vector entries generated by current GPU and other GPUs, so the communication between GPUs becomes a major performance bottleneck. In this paper, we study the implementation of the JPCG on multi-GPU cluster. Considering the coarse-grained parallelism between GPUs and the sparsity of matrices arising from the finite element method (FEM), a simple and fast node reordering method is presented to optimize the bandwidth of sparse matrices, resulting in a reduction of the communication between GPUs. This novel reordering method is based on integerized nodal coordinates of FEM mesh and the counting sort algorithm. Additionally, computation and communication are overlapped using CUDA asynchronous memory transfer and MPI_sendrecv communication to further reduce the communication cost. A JPCG solver on multi-GPU cluster is developed using CUDA Fortran. Tests show that this solver has high efficiency and strong scalability.

引用

页码：433 / 454

页数：22

共 50 条

[41] Cluster-Based Approach to a Multi-GPU CT Reconstruction Algorithm
Orr, Laurel J.
Jimenez, Edward S.
Thompson, Kyle R.
2014 IEEE NUCLEAR SCIENCE SYMPOSIUM AND MEDICAL IMAGING CONFERENCE (NSS/MIC), 2014,
[42] A multi-GPU finite volume solver for magnetohydrodynamics-based solar wind simulations
Wang, Yuan
Feng, Xueshang
Zhou, Yufen
Gan, Xinbiao
COMPUTER PHYSICS COMMUNICATIONS, 2019, 238 : 181 - 193
[43] Parallel multi-GPU implementation of fast decoupled power flow solver with hybrid architecture
Zeng, Lei
Alawneh, Shadi G.
Arefifar, Seyed Ali.
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (01): : 1125 - 1136
[44] Application of the CUDA® Toolkit Multi-GPU Libraries to an Out-of-Core MoM Solver
Saxerud, Alexander L.
Ferrell, Jack P.
Dunn, Eric A.
2016 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM, 2016, : 2013 - 2014
[45] High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning
Cevahir, Ali
Nukada, Akira
Matsuoka, Satoshi
COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2010, 25 (1-2): : 83 - 91
[46] Multi-GPU implementation of a hybrid thermal lattice Boltzmann solver using the TheLMA framework
Obrecht, Christian
Kuznik, Frederic
Tourancheau, Bernard
Roux, Jean-Jacques
COMPUTERS & FLUIDS, 2013, 80 : 269 - 275
[47] Parallel multi-GPU implementation of fast decoupled power flow solver with hybrid architecture
Lei Zeng
Shadi G. Alawneh
Seyed Ali. Arefifar
Cluster Computing, 2024, 27 : 1125 - 1136
[48] Monte Carlo Optimisation Auto-Tuning on a Multi-GPU Cluster
Paukste, Andrius
2012 2ND IEEE INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2012, : 894 - 898
[49] Multi-GPU based Cluster System for CT Iterative Reconstruction Algorithm
Lu, Wan-li
Yan, Bin
Chen, Jian-lin
Cai, Ai-long
Li, Lei
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MECHATRONICS AND INDUSTRIAL INFORMATICS, 2015, 31 : 881 - 886
[50] Parallel Sub-Structuring Methods for solving Sparse Linear Systems on a cluster of GPU
Ahamed, Abal-Kassim Cheik
Magoules, Frederic
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 121 - 128

← 1 2 3 4 5 →