A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver

被引:0
|
作者
Ding, Nan [1 ]
Liu, Yang [2 ]
Williams, Samuel [1 ]
Li, Xiaoye S. [2 ]
机构
[1] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
[2] Lawrence Berkeley Natl Lab, Scalable Solvers Grp, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse triangular solve is used in conjunction with Sparse LU for solving sparse linear systems, either as a direct solver or as a preconditioner. As GPUs have become a first-class compute citizen, designing an efficient and scalable SpTRSV on multi-GPU HPC systems is imperative. In this paper, we leverage the advantage of GPU-initiated data transfers of NVSHMEM to implement and evaluate a Multi-GPU SpTRSV. We create a novel producer-consumer paradigm to manage the computation and communication in SpTRSV and implement it using two CUDA streams. Our multi-GPU SpTRSV implementation using CUDA streams achieves a 3.7x speedup when using twelve GPUs (two nodes) relative to our implementation on a single GPU, and up to 6.1x compared to cusparse csrsv2() over the range of one to eighteen GPUs. To further explain the observed performance and explore the key features of matrices to estimate the potential performance benefits when using multi-GPU, we extend the critical path model of SpTRSV to GPUs. We demonstrate the ability of our performance model to understand various aspects of performance and performance bottlenecks on multi-GPU and motivate code optimizations.
引用
收藏
页码:147 / 159
页数:13
相关论文
共 50 条
  • [31] An efficient parallel collaborative filtering algorithm on multi-GPU platform
    Zhongya Wang
    Ying Liu
    Steve Chiu
    The Journal of Supercomputing, 2016, 72 : 2080 - 2094
  • [32] A Massively Parallel and Scalable Multi-GPU Material Point Method
    Wang, Xinlei
    Qiu, Yuxing
    Slattery, Stuart R.
    Fang, Yu
    Li, Minchen
    Zhu, Song-Chun
    Zhu, Yixin
    Tang, Min
    Manocha, Dinesh
    Jiang, Chenfanfu
    ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04):
  • [33] Performance Analysis of Parallel FFT on Large Multi-GPU Systems
    Ayala, Alan
    Tomov, Stan
    Stoyanov, Miroslav
    Haidar, Azzam
    Dongarra, Jack
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 372 - 381
  • [34] Distributed Multi-GPU Accelerated Hybrid Parallel Rendering for Massively Parallel Environment
    Cao, Yi
    Wang, Huawei
    Ai, Zhiwei
    2014 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV2014), 2014, : 30 - 36
  • [35] An improved direct linear equation solver using multi-GPU in multi-body dynamics
    Jung, Ji-Hyun
    Bae, Dae-Sung
    ADVANCES IN ENGINEERING SOFTWARE, 2018, 115 : 87 - 102
  • [36] A multi-GPU finite volume solver for magnetohydrodynamics-based solar wind simulations
    Wang, Yuan
    Feng, Xueshang
    Zhou, Yufen
    Gan, Xinbiao
    COMPUTER PHYSICS COMMUNICATIONS, 2019, 238 : 181 - 193
  • [37] A Parallel Implementation of JPEG2000 Encoder on Multi-GPU System
    Kim, Bumho
    Lee, Jeong-Woo
    Yoon, Ki-Song
    2014 16TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2014, : 610 - 613
  • [38] Multi-GPU Accelerated Parallel Algorithm of Wallis Transformation for Image Enhancement
    Xiao, Han
    Song, Yu-Pu
    Zhou, Qing-Lei
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2014, 7 (02): : 99 - 114
  • [39] Multi-GPU Parallel Memetic Algorithm for Capacitated Vehicle Routing Problem
    Wodecki, Mieczyslaw
    Bozejko, Wojciech
    Karpinski, Michaffl
    Pacut, Maciej
    PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2013), PT II, 2014, 8385 : 207 - 214
  • [40] Application of the CUDA® Toolkit Multi-GPU Libraries to an Out-of-Core MoM Solver
    Saxerud, Alexander L.
    Ferrell, Jack P.
    Dunn, Eric A.
    2016 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM, 2016, : 2013 - 2014