A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

被引:17
|
作者
Lin, Shaozhong [1 ,2 ]
Xie, Zhiqiang [1 ,2 ]
机构
[1] Changjiang River Sci Res Inst, Wuhan 430010, Peoples R China
[2] Res Ctr Water Engn Safety & Disaster Prevent MWR, Wuhan 430010, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期
基金
中国国家自然科学基金;
关键词
JPCG; Sparse linear systems; Multi-GPU cluster; Communication reduction; Node reordering; Counting sort; Computation/communication overlapping;
D O I
10.1007/s11227-016-1887-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The General Purpose Graphics Processing Unit (GPGPU or GPU) has powerful float-point computation ability and is suitable for intensive computing, such as solving large linear systems. The Jacobi Preconditioned Conjugate Gradient method (Jacobi_PCG or JPCG), one type of preconditioned iteration methods for the numerical solution of large sparse linear systems, has advantages of high parallelism and is especially appropriate for implementation on GPUs. On multi-GPU cluster, the matrix-vector multiplication involved in the PCG iteration needs the vector entries generated by current GPU and other GPUs, so the communication between GPUs becomes a major performance bottleneck. In this paper, we study the implementation of the JPCG on multi-GPU cluster. Considering the coarse-grained parallelism between GPUs and the sparsity of matrices arising from the finite element method (FEM), a simple and fast node reordering method is presented to optimize the bandwidth of sparse matrices, resulting in a reduction of the communication between GPUs. This novel reordering method is based on integerized nodal coordinates of FEM mesh and the counting sort algorithm. Additionally, computation and communication are overlapped using CUDA asynchronous memory transfer and MPI_sendrecv communication to further reduce the communication cost. A JPCG solver on multi-GPU cluster is developed using CUDA Fortran. Tests show that this solver has high efficiency and strong scalability.
引用
收藏
页码:433 / 454
页数:22
相关论文
共 50 条
  • [1] A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster
    Shaozhong Lin
    Zhiqiang Xie
    The Journal of Supercomputing, 2017, 73 : 433 - 454
  • [2] A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver
    Ding, Nan
    Liu, Yang
    Williams, Samuel
    Li, Xiaoye S.
    PROCEEDINGS OF THE 2021 SIAM CONFERENCE ON APPLIED AND COMPUTATIONAL DISCRETE ALGORITHMS, ACDA21, 2021, : 147 - 159
  • [3] Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures
    Xie, Chenhao
    Chen, Jieyang
    Firoz, Jesun
    Li, Jiajia
    Song, Shuaiwen Leon
    Barker, Kevin
    Raugas, Mark
    Li, Ang
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
  • [4] A parallel nonlinear multigrid solver for unsteady incompressible flow simulation on multi-GPU cluster
    Shi, Xiaolei
    Agrawal, Tanmay
    Lin, Chao-An
    Hwang, Feng-Nan
    Chiu, Tzu-Hsuan
    JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 414
  • [5] Benchmarking multi-GPU applications on modern multi-GPU integrated systems
    Bernaschi, Massimo
    Agostini, Elena
    Rossetti, Davide
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (14):
  • [6] An improved direct linear equation solver using multi-GPU in multi-body dynamics
    Jung, Ji-Hyun
    Bae, Dae-Sung
    ADVANCES IN ENGINEERING SOFTWARE, 2018, 115 : 87 - 102
  • [7] Acoustic scattering solver based on single level FMM for multi-GPU systems
    Lopez-Portugues, Miguel
    Lopez-Fernandez, Jesus A.
    Menendez-Canal, Jonatan
    Rodriguez-Campa, Alberto
    Ranilla, Jose
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (09) : 1057 - 1064
  • [8] Modelling Multi-GPU Systems
    Spampinato, Daniele G.
    Elster, Anne C.
    Natvig, Thorvald
    PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 562 - 569
  • [9] Hybrid Multi-GPU Solver Based on Schur Complement Method
    Kopysov, Sergey
    Kuzmin, Igor
    Nedozhogin, Nikita
    Novikov, Alexander
    Sagdeeva, Yulia
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2013), 2013, 7979 : 65 - 79
  • [10] Fast GMRES-GPU solver for large scale sparse linear systems
    Liu, Youquan
    Yin, Kangxue
    Wu, Enhua
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2011, 23 (04): : 553 - 560