A Jacobi_PCG solver for sparse linear systems on multi-GPU cluster

被引:17
|
作者
Lin, Shaozhong [1 ,2 ]
Xie, Zhiqiang [1 ,2 ]
机构
[1] Changjiang River Sci Res Inst, Wuhan 430010, Peoples R China
[2] Res Ctr Water Engn Safety & Disaster Prevent MWR, Wuhan 430010, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期
基金
中国国家自然科学基金;
关键词
JPCG; Sparse linear systems; Multi-GPU cluster; Communication reduction; Node reordering; Counting sort; Computation/communication overlapping;
D O I
10.1007/s11227-016-1887-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The General Purpose Graphics Processing Unit (GPGPU or GPU) has powerful float-point computation ability and is suitable for intensive computing, such as solving large linear systems. The Jacobi Preconditioned Conjugate Gradient method (Jacobi_PCG or JPCG), one type of preconditioned iteration methods for the numerical solution of large sparse linear systems, has advantages of high parallelism and is especially appropriate for implementation on GPUs. On multi-GPU cluster, the matrix-vector multiplication involved in the PCG iteration needs the vector entries generated by current GPU and other GPUs, so the communication between GPUs becomes a major performance bottleneck. In this paper, we study the implementation of the JPCG on multi-GPU cluster. Considering the coarse-grained parallelism between GPUs and the sparsity of matrices arising from the finite element method (FEM), a simple and fast node reordering method is presented to optimize the bandwidth of sparse matrices, resulting in a reduction of the communication between GPUs. This novel reordering method is based on integerized nodal coordinates of FEM mesh and the counting sort algorithm. Additionally, computation and communication are overlapped using CUDA asynchronous memory transfer and MPI_sendrecv communication to further reduce the communication cost. A JPCG solver on multi-GPU cluster is developed using CUDA Fortran. Tests show that this solver has high efficiency and strong scalability.
引用
收藏
页码:433 / 454
页数:22
相关论文
共 50 条
  • [41] Cluster-Based Approach to a Multi-GPU CT Reconstruction Algorithm
    Orr, Laurel J.
    Jimenez, Edward S.
    Thompson, Kyle R.
    2014 IEEE NUCLEAR SCIENCE SYMPOSIUM AND MEDICAL IMAGING CONFERENCE (NSS/MIC), 2014,
  • [42] A multi-GPU finite volume solver for magnetohydrodynamics-based solar wind simulations
    Wang, Yuan
    Feng, Xueshang
    Zhou, Yufen
    Gan, Xinbiao
    COMPUTER PHYSICS COMMUNICATIONS, 2019, 238 : 181 - 193
  • [43] Parallel multi-GPU implementation of fast decoupled power flow solver with hybrid architecture
    Zeng, Lei
    Alawneh, Shadi G.
    Arefifar, Seyed Ali.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (01): : 1125 - 1136
  • [44] Application of the CUDA® Toolkit Multi-GPU Libraries to an Out-of-Core MoM Solver
    Saxerud, Alexander L.
    Ferrell, Jack P.
    Dunn, Eric A.
    2016 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM, 2016, : 2013 - 2014
  • [45] High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning
    Cevahir, Ali
    Nukada, Akira
    Matsuoka, Satoshi
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2010, 25 (1-2): : 83 - 91
  • [46] Multi-GPU implementation of a hybrid thermal lattice Boltzmann solver using the TheLMA framework
    Obrecht, Christian
    Kuznik, Frederic
    Tourancheau, Bernard
    Roux, Jean-Jacques
    COMPUTERS & FLUIDS, 2013, 80 : 269 - 275
  • [47] Parallel multi-GPU implementation of fast decoupled power flow solver with hybrid architecture
    Lei Zeng
    Shadi G. Alawneh
    Seyed Ali. Arefifar
    Cluster Computing, 2024, 27 : 1125 - 1136
  • [48] Monte Carlo Optimisation Auto-Tuning on a Multi-GPU Cluster
    Paukste, Andrius
    2012 2ND IEEE INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2012, : 894 - 898
  • [49] Multi-GPU based Cluster System for CT Iterative Reconstruction Algorithm
    Lu, Wan-li
    Yan, Bin
    Chen, Jian-lin
    Cai, Ai-long
    Li, Lei
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MECHATRONICS AND INDUSTRIAL INFORMATICS, 2015, 31 : 881 - 886
  • [50] Parallel Sub-Structuring Methods for solving Sparse Linear Systems on a cluster of GPU
    Ahamed, Abal-Kassim Cheik
    Magoules, Frederic
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 121 - 128