Scalable parallel linear solver for compact banded systems on heterogeneous architectures

被引：1

作者：

Song, Hang ^{[1
]}

Matsuno, Kristen V. ^{[1
]}

West, Jacob R. ^{[1
]}

Subramaniam, Akshay ^{[2
]}

Ghate, Aditya S. ^{[2
]}

Lele, Sanjiva K. ^{[1
,2
]}

机构：

[1] Stanford Univ, Dept Mech Engn, Stanford, CA 94305 USA

[2] Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA

来源：

JOURNAL OF COMPUTATIONAL PHYSICS | 2022年 / 468卷

基金：

美国国家科学基金会;

关键词：

Compact banded system; Periodic boundary; Parallel cyclic reduction; Distributed memory; Parallel computing; BLOCK TRIDIAGONAL SYSTEMS; LARGE-EDDY SIMULATION; CYCLIC REDUCTION; DIFFERENCE SCHEMES; FLOW;

D O I：

10.1016/j.jcp.2022.111443

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

A scalable algorithm for solving compact banded linear systems on distributed memory architectures is presented. The proposed method factorizes the original system into two levels of memory hierarchies, and solves it using parallel cyclic reduction on both distributed and shared memory. This method has a lower communication footprint across distributed memory partitions compared to conventional algorithms involving data transposes or re-partitioning. The algorithm developed in this work is generalized to cyclic compact banded systems with flexible data decompositions. For cyclic compact banded systems, the method is a direct solver with a deterministic operation and communication counts depending on the matrix size, its bandwidth, and the partition strategy. The implementation and runtime configuration details are discussed for performance opti-mization. Scalability is demonstrated on the linear solver as well as on a representative fluid mechanics application problem, in which the dominant computational cost is solving the cyclic tridiagonal linear systems of compact numerical schemes on a 3D periodic domain. The algorithm is particularly useful for solving the linear systems arising from the application of compact finite difference operators to a wide range of partial differential equation problems, such as but not limited to the numerical simulations of compressible turbulent flows, aeroacoustics, elastic-plastic wave propagation, and electromagnetics. It alleviates obstacles to their use on modern high performance computing hardware, where memory and computational power are distributed across nodes with multi-threaded processing units. (c) 2022 Elsevier Inc. All rights reserved.

引用

页数：16

共 50 条

[41] A direct solver for solving systems of linear equations with banded ill-conditioned Toeplitz matrices
Akhoundi, Nasser
JOURNAL OF MATHEMATICAL MODELING, 2022, 10 (04): : 453 - 461
[42] A Scalable Barotropic Mode Solver for the Parallel Ocean Program
Hu, Yong
Huang, Xiaomeng
Wang, Xiaoge
Fu, Haohuan
Xu, Shizhen
Ruan, Huabin
Xue, Wei
Yang, Guangwen
EURO-PAR 2013 PARALLEL PROCESSING, 2013, 8097 : 739 - 750
[43] Modular scalable parallel architectures for fast transforms
Johnson, RW
Koyrakh, LA
Pihl, DM
14TH ANNUAL IEEE INTERNATIONAL ASIC/SOC CONFERENCE, PROCEEDINGS, 2001, : 286 - 290
[44] Scalable parallel memory Architectures for video coding
Tanskanen, JK
Niittylahti, JT
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 38 (02): : 173 - 199
[45] PARALLEL LINEAR-SYSTEM SOLVER
EVANS, DJ
HATZOPOULOS, M
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1979, 7 (03) : 227 - 238
[46] Scalable Parallel Memory Architectures for Video Coding
Jarno K. Tanskanen
Jarkko T. Niittylahti
Journal of VLSI signal processing systems for signal, image and video technology, 2004, 38 : 173 - 199
[47] Parallel Algorithm with Parameters Based on Alternating Direction for Solving Banded Linear Systems
Ma, Xinrong
Liu, Sanyang
Xiao, Manyu
Xie, Gongnan
MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
[48] SOLVING NARROW BANDED SYSTEMS ON ENSEMBLE ARCHITECTURES
JOHNSSON, SL
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1985, 11 (03): : 271 - 288
[49] Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs
Macintosh, Hamish J.
Banks, Jasmine E.
Kelson, Neil A.
INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2019, 2019
[50] GPU-Accelerated Scalable Solver for Large Linear Systems over Finite Fields
Gupta, Indivar
Verma, Prashant
Deshpande, Vinay
Vydyanathan, Nagavijayalakshmi
Sharma, Bharatkumar
2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 324 - 329

← 1 2 3 4 5 →