A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver

被引：0

作者：

Ding, Nan ^{[1
]}

Liu, Yang ^{[2
]}

Williams, Samuel ^{[1
]}

Li, Xiaoye S. ^{[2
]}

机构：

[1] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA

[2] Lawrence Berkeley Natl Lab, Scalable Solvers Grp, Berkeley, CA 94720 USA

来源：

PROCEEDINGS OF THE 2021 SIAM CONFERENCE ON APPLIED AND COMPUTATIONAL DISCRETE ALGORITHMS, ACDA21 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Sparse triangular solve is used in conjunction with Sparse LU for solving sparse linear systems, either as a direct solver or as a preconditioner. As GPUs have become a first-class compute citizen, designing an efficient and scalable SpTRSV on multi-GPU HPC systems is imperative. In this paper, we leverage the advantage of GPU-initiated data transfers of NVSHMEM to implement and evaluate a Multi-GPU SpTRSV. We create a novel producer-consumer paradigm to manage the computation and communication in SpTRSV and implement it using two CUDA streams. Our multi-GPU SpTRSV implementation using CUDA streams achieves a 3.7x speedup when using twelve GPUs (two nodes) relative to our implementation on a single GPU, and up to 6.1x compared to cusparse csrsv2() over the range of one to eighteen GPUs. To further explain the observed performance and explore the key features of matrices to estimate the potential performance benefits when using multi-GPU, we extend the critical path model of SpTRSV to GPUs. We demonstrate the ability of our performance model to understand various aspects of performance and performance bottlenecks on multi-GPU and motivate code optimizations.

引用

页码：147 / 159

页数：13

共 50 条

[31] An efficient parallel collaborative filtering algorithm on multi-GPU platform
Zhongya Wang
Ying Liu
Steve Chiu
The Journal of Supercomputing, 2016, 72 : 2080 - 2094
[32] A Massively Parallel and Scalable Multi-GPU Material Point Method
Wang, Xinlei
Qiu, Yuxing
Slattery, Stuart R.
Fang, Yu
Li, Minchen
Zhu, Song-Chun
Zhu, Yixin
Tang, Min
Manocha, Dinesh
Jiang, Chenfanfu
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04):
[33] Performance Analysis of Parallel FFT on Large Multi-GPU Systems
Ayala, Alan
Tomov, Stan
Stoyanov, Miroslav
Haidar, Azzam
Dongarra, Jack
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 372 - 381
[34] Distributed Multi-GPU Accelerated Hybrid Parallel Rendering for Massively Parallel Environment
Cao, Yi
Wang, Huawei
Ai, Zhiwei
2014 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV2014), 2014, : 30 - 36
[35] An improved direct linear equation solver using multi-GPU in multi-body dynamics
Jung, Ji-Hyun
Bae, Dae-Sung
ADVANCES IN ENGINEERING SOFTWARE, 2018, 115 : 87 - 102
[36] A multi-GPU finite volume solver for magnetohydrodynamics-based solar wind simulations
Wang, Yuan
Feng, Xueshang
Zhou, Yufen
Gan, Xinbiao
COMPUTER PHYSICS COMMUNICATIONS, 2019, 238 : 181 - 193
[37] A Parallel Implementation of JPEG2000 Encoder on Multi-GPU System
Kim, Bumho
Lee, Jeong-Woo
Yoon, Ki-Song
2014 16TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2014, : 610 - 613
[38] Multi-GPU Accelerated Parallel Algorithm of Wallis Transformation for Image Enhancement
Xiao, Han
Song, Yu-Pu
Zhou, Qing-Lei
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2014, 7 (02): : 99 - 114
[39] Multi-GPU Parallel Memetic Algorithm for Capacitated Vehicle Routing Problem
Wodecki, Mieczyslaw
Bozejko, Wojciech
Karpinski, Michaffl
Pacut, Maciej
PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2013), PT II, 2014, 8385 : 207 - 214
[40] Application of the CUDA® Toolkit Multi-GPU Libraries to an Out-of-Core MoM Solver
Saxerud, Alexander L.
Ferrell, Jack P.
Dunn, Eric A.
2016 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM, 2016, : 2013 - 2014

← 1 2 3 4 5 →