Scalable parallel linear solver for compact banded systems on heterogeneous architectures

被引:1
|
作者
Song, Hang [1 ]
Matsuno, Kristen V. [1 ]
West, Jacob R. [1 ]
Subramaniam, Akshay [2 ]
Ghate, Aditya S. [2 ]
Lele, Sanjiva K. [1 ,2 ]
机构
[1] Stanford Univ, Dept Mech Engn, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
Compact banded system; Periodic boundary; Parallel cyclic reduction; Distributed memory; Parallel computing; BLOCK TRIDIAGONAL SYSTEMS; LARGE-EDDY SIMULATION; CYCLIC REDUCTION; DIFFERENCE SCHEMES; FLOW;
D O I
10.1016/j.jcp.2022.111443
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A scalable algorithm for solving compact banded linear systems on distributed memory architectures is presented. The proposed method factorizes the original system into two levels of memory hierarchies, and solves it using parallel cyclic reduction on both distributed and shared memory. This method has a lower communication footprint across distributed memory partitions compared to conventional algorithms involving data transposes or re-partitioning. The algorithm developed in this work is generalized to cyclic compact banded systems with flexible data decompositions. For cyclic compact banded systems, the method is a direct solver with a deterministic operation and communication counts depending on the matrix size, its bandwidth, and the partition strategy. The implementation and runtime configuration details are discussed for performance opti-mization. Scalability is demonstrated on the linear solver as well as on a representative fluid mechanics application problem, in which the dominant computational cost is solving the cyclic tridiagonal linear systems of compact numerical schemes on a 3D periodic domain. The algorithm is particularly useful for solving the linear systems arising from the application of compact finite difference operators to a wide range of partial differential equation problems, such as but not limited to the numerical simulations of compressible turbulent flows, aeroacoustics, elastic-plastic wave propagation, and electromagnetics. It alleviates obstacles to their use on modern high performance computing hardware, where memory and computational power are distributed across nodes with multi-threaded processing units. (c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] A direct solver for solving systems of linear equations with banded ill-conditioned Toeplitz matrices
    Akhoundi, Nasser
    JOURNAL OF MATHEMATICAL MODELING, 2022, 10 (04): : 453 - 461
  • [42] A Scalable Barotropic Mode Solver for the Parallel Ocean Program
    Hu, Yong
    Huang, Xiaomeng
    Wang, Xiaoge
    Fu, Haohuan
    Xu, Shizhen
    Ruan, Huabin
    Xue, Wei
    Yang, Guangwen
    EURO-PAR 2013 PARALLEL PROCESSING, 2013, 8097 : 739 - 750
  • [43] Modular scalable parallel architectures for fast transforms
    Johnson, RW
    Koyrakh, LA
    Pihl, DM
    14TH ANNUAL IEEE INTERNATIONAL ASIC/SOC CONFERENCE, PROCEEDINGS, 2001, : 286 - 290
  • [44] Scalable parallel memory Architectures for video coding
    Tanskanen, JK
    Niittylahti, JT
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 38 (02): : 173 - 199
  • [45] PARALLEL LINEAR-SYSTEM SOLVER
    EVANS, DJ
    HATZOPOULOS, M
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1979, 7 (03) : 227 - 238
  • [46] Scalable Parallel Memory Architectures for Video Coding
    Jarno K. Tanskanen
    Jarkko T. Niittylahti
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 38 : 173 - 199
  • [47] Parallel Algorithm with Parameters Based on Alternating Direction for Solving Banded Linear Systems
    Ma, Xinrong
    Liu, Sanyang
    Xiao, Manyu
    Xie, Gongnan
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [48] SOLVING NARROW BANDED SYSTEMS ON ENSEMBLE ARCHITECTURES
    JOHNSSON, SL
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1985, 11 (03): : 271 - 288
  • [49] Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs
    Macintosh, Hamish J.
    Banks, Jasmine E.
    Kelson, Neil A.
    INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2019, 2019
  • [50] GPU-Accelerated Scalable Solver for Large Linear Systems over Finite Fields
    Gupta, Indivar
    Verma, Prashant
    Deshpande, Vinay
    Vydyanathan, Nagavijayalakshmi
    Sharma, Bharatkumar
    2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 324 - 329