A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

被引:17
|
作者
Lin, Shaozhong [1 ,2 ]
Xie, Zhiqiang [1 ,2 ]
机构
[1] Changjiang River Sci Res Inst, Wuhan 430010, Peoples R China
[2] Res Ctr Water Engn Safety & Disaster Prevent MWR, Wuhan 430010, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期
基金
中国国家自然科学基金;
关键词
JPCG; Sparse linear systems; Multi-GPU cluster; Communication reduction; Node reordering; Counting sort; Computation/communication overlapping;
D O I
10.1007/s11227-016-1887-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The General Purpose Graphics Processing Unit (GPGPU or GPU) has powerful float-point computation ability and is suitable for intensive computing, such as solving large linear systems. The Jacobi Preconditioned Conjugate Gradient method (Jacobi_PCG or JPCG), one type of preconditioned iteration methods for the numerical solution of large sparse linear systems, has advantages of high parallelism and is especially appropriate for implementation on GPUs. On multi-GPU cluster, the matrix-vector multiplication involved in the PCG iteration needs the vector entries generated by current GPU and other GPUs, so the communication between GPUs becomes a major performance bottleneck. In this paper, we study the implementation of the JPCG on multi-GPU cluster. Considering the coarse-grained parallelism between GPUs and the sparsity of matrices arising from the finite element method (FEM), a simple and fast node reordering method is presented to optimize the bandwidth of sparse matrices, resulting in a reduction of the communication between GPUs. This novel reordering method is based on integerized nodal coordinates of FEM mesh and the counting sort algorithm. Additionally, computation and communication are overlapped using CUDA asynchronous memory transfer and MPI_sendrecv communication to further reduce the communication cost. A JPCG solver on multi-GPU cluster is developed using CUDA Fortran. Tests show that this solver has high efficiency and strong scalability.
引用
收藏
页码:433 / 454
页数:22
相关论文
共 50 条
  • [21] SparSol: sparse linear systems solver
    Diyankov, O. V.
    Koshelev, S. V.
    Kotegov, S. S.
    Krasnogorov, I. V.
    Kuznetsova, N. N.
    Pravilnikov, V. Y.
    Beckner, B. L.
    Maliassov, S. Y.
    Mishev, I. D.
    Usadi, A. K.
    RUSSIAN JOURNAL OF NUMERICAL ANALYSIS AND MATHEMATICAL MODELLING, 2007, 22 (04) : 325 - 339
  • [22] Strength Check of Aircraft Parts Based on Multi-GPU Clusters for Fast Calculation of Sparse Linear Equations
    Zhang, Yuhua
    Hu, Binxing
    IEEE ACCESS, 2020, 8 : 77188 - 77203
  • [23] New Generation of WIPL-D In-Core Multi-GPU Solver
    Mrdakovic, Branko Lj.
    Kostic, Milan M.
    Olcan, Dragan I.
    Kolundzija, Branko M.
    2018 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION & USNC/URSI NATIONAL RADIO SCIENCE MEETING, 2018, : 413 - 414
  • [24] Solver of Multi-GPU Compressible Turbulence Parallel Simulations Used in Aerodynamic Teaching
    Luo Kai
    Cao Wenbin
    Li Song
    Song Limin
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 707 - 710
  • [25] Solution of Large Sparse System of Linear Equations over GF(2) on a Multi-Node Multi-GPU Platform
    Rawal, Shruti
    Gupta, Indivar
    DEFENCE SCIENCE JOURNAL, 2022, 72 (06) : 836 - 845
  • [26] A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
    Ament, M.
    Knittel, G.
    Weiskopf, D.
    Strasser, W.
    PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, : 583 - 592
  • [27] Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster
    Xian, Wang
    Takayuki, Aoki
    PARALLEL COMPUTING, 2011, 37 (09) : 521 - 535
  • [28] Simulating cortical networks on heterogeneous multi-GPU systems
    Nere, Andrew
    Franey, Sean
    Hashmi, Atif
    Lipasti, Mikko
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (07) : 953 - 971
  • [29] Efficient Solving of Scan Primitive on Multi-GPU Systems
    Dieguez, Adrian P.
    Amor, Margarita
    Doallo, Ramon
    Nukada, Akira
    Matsuoka, Satoshi
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 794 - 803
  • [30] Linear Solver Performance in Elastoplastic Problem Solution on GPU Cluster
    Khalevitsky, Yu. V.
    Konovalov, A. V.
    Burmasheva, N. V.
    Partin, A. S.
    MECHANICS, RESOURCE AND DIAGNOSTICS OF MATERIALS AND STRUCTURES (MRDMS-2017), 2017, 1915