Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems

被引:0
|
作者
Sukkari, Dalal [1 ]
Ltaief, Hatem [1 ]
Keyes, David [1 ]
Faverge, Mathieu [2 ]
机构
[1] King Abdullah Univ Sci & Technol, Extreme Comp Res Ctr, Jeddah 23955, Saudi Arabia
[2] Univ Bordeaux, CNRS, Inria, Bordeaux INP, F-33400 Talence, France
关键词
QR FACTORIZATION; ALGORITHMS;
D O I
10.1109/cluster.2019.8891024
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes how to leverage a task-based implementation of the polar decomposition on massively parallel systems using the PARSEC dynamic runtime system. Based on a formulation of the iterative QR Dynamically-Weighted Halley (QDWH) algorithm, our novel implementation reduces data traffic while exploiting high concurrency from the underlying hardware architecture. First, we replace the most time-consuming classical QR factorization phase with a new hierarchical variant, customized for the specific structure of the matrix during the QDWH iterations. The newly developed hierarchical QR for QDWH exploits not only the matrix structure, but also shortens the length of the critical path to maximize hardware occupancy. We then deploy PARSEC to seamlessly orchestrate, pipeline, and track the data dependencies of the various linear algebra building blocks involved during the iterative QDWH algorithm. PARSEC enables to overlap communications with computations thanks to its asynchronous scheduling of fine-grained computational tasks. It employs look-ahead techniques to further expose parallelism, while actively pursuing the critical path. In addition, we identify synergistic opportunities between the task-based QDWH algorithm and the PARSEC framework. We exploit them during the hierarchical QR factorization to enforce a locality-aware task execution. The latter feature permits to minimize the expensive inter-node communication, which represents one of the main bottlenecks for scaling up applications on challenging distributed-memory systems. We report numerical accuracy and performance results using well and ill-conditioned matrices. The benchmarking campaign reveals up to 2X performance speedup against the existing state-of-the-art implementation for the polar decomposition on 36,864 cores.
引用
收藏
页码:69 / 80
页数:12
相关论文
共 50 条
  • [1] Massively Parallel Polar Decomposition on Distributed-memory Systems
    Ltaief, Hatem
    Sukkari, Dalal
    Esposito, Aniello
    Nakatsukasa, Yuji
    Keyes, David
    ACM TRANSACTIONS ON PARALLEL COMPUTING, 2019, 6 (01)
  • [2] Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures
    Sukkari, Dalal
    Ltaief, Hatem
    Faverge, Mathieu
    Keyes, David
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (02) : 312 - 323
  • [3] Parallelization Using Task Parallel Library with Task-Based Programming Model
    Hei, Xinhong
    Zhang, Jinlong
    Wang, Bin
    Jin, Haiyan
    Giacaman, Nasser
    2014 5TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2014, : 653 - 656
  • [4] Task-Based Decomposition of Factored POMDPs
    Shani, Guy
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (02) : 208 - 216
  • [5] Task-Based Cholesky Decomposition on Knights Corner Using OpenMP
    Dorris, Joseph
    Kurzak, Jakub
    Luszczek, Piotr
    YarKhan, Asim
    Dongarra, Jack
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 544 - 562
  • [6] Improving parallel executions by increasing task granularity in task-based runtime systems using acyclic DAG clustering
    Bramas, Berenger
    Ketterlin, Alain
    PEERJ COMPUTER SCIENCE, 2020, PeerJ Inc. (2020) : 1 - 26
  • [7] PaRSEC: Scalability, flexibility, and hybrid architecture support for task-based applications in ECP
    Bouteiller, Aurelien
    Herault, Thomas
    Cao, Qinglei
    Schuchart, Joseph
    Bosilca, George
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2025, 39 (01): : 147 - 166
  • [8] Dynamics of task-based confidence in schizophrenia using seasonal decomposition approach
    Badal, Varsha D.
    Depp, Colin A.
    Pinkham, Amy E.
    Harvey, Philip D.
    SCHIZOPHRENIA RESEARCH-COGNITION, 2023, 32
  • [9] Termination Checking and Task Decomposition for Task-Based Intermittent Programs
    Colin, Alexei
    Lucia, Brandon
    CC'18: PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION, 2018, : 116 - 127
  • [10] A Parallel Task-based Approach to Linear Algebra
    Tousimojarad, Ashkan
    Vanderbauwhede, Wim
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 59 - 66