Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems

被引:0
|
作者
Sukkari, Dalal [1 ]
Ltaief, Hatem [1 ]
Keyes, David [1 ]
Faverge, Mathieu [2 ]
机构
[1] King Abdullah Univ Sci & Technol, Extreme Comp Res Ctr, Jeddah 23955, Saudi Arabia
[2] Univ Bordeaux, CNRS, Inria, Bordeaux INP, F-33400 Talence, France
关键词
QR FACTORIZATION; ALGORITHMS;
D O I
10.1109/cluster.2019.8891024
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes how to leverage a task-based implementation of the polar decomposition on massively parallel systems using the PARSEC dynamic runtime system. Based on a formulation of the iterative QR Dynamically-Weighted Halley (QDWH) algorithm, our novel implementation reduces data traffic while exploiting high concurrency from the underlying hardware architecture. First, we replace the most time-consuming classical QR factorization phase with a new hierarchical variant, customized for the specific structure of the matrix during the QDWH iterations. The newly developed hierarchical QR for QDWH exploits not only the matrix structure, but also shortens the length of the critical path to maximize hardware occupancy. We then deploy PARSEC to seamlessly orchestrate, pipeline, and track the data dependencies of the various linear algebra building blocks involved during the iterative QDWH algorithm. PARSEC enables to overlap communications with computations thanks to its asynchronous scheduling of fine-grained computational tasks. It employs look-ahead techniques to further expose parallelism, while actively pursuing the critical path. In addition, we identify synergistic opportunities between the task-based QDWH algorithm and the PARSEC framework. We exploit them during the hierarchical QR factorization to enforce a locality-aware task execution. The latter feature permits to minimize the expensive inter-node communication, which represents one of the main bottlenecks for scaling up applications on challenging distributed-memory systems. We report numerical accuracy and performance results using well and ill-conditioned matrices. The benchmarking campaign reveals up to 2X performance speedup against the existing state-of-the-art implementation for the polar decomposition on 36,864 cores.
引用
收藏
页码:69 / 80
页数:12
相关论文
共 50 条
  • [41] Mesh Variants for Massively Parallel Systems Using MATLAB
    Nasir, Faizan
    Bokhari, Mohammad Ubaidullah
    Samad, Abdus
    ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY AND COMPUTING, AICTC 2021, 2022, 392 : 227 - 234
  • [42] Mitigating the NUMA effect on task-based runtime systems
    Marcos Maroñas
    Antoni Navarro
    Eduard Ayguadé
    Vicenç Beltran
    The Journal of Supercomputing, 2023, 79 : 14287 - 14312
  • [43] Optimisation problems for dynamic concurrent task-based systems
    Verkest, D
    Yang, P
    Wong, C
    Marchal, P
    ICCAD 2001: IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, DIGEST OF TECHNICAL PAPERS, 2001, : 265 - 268
  • [44] A Task-Based Design Approach for Augmented Reality Systems
    Pribeanu, Costin
    Vilkonis, Rytis
    Iordache, Dragos Daniel
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 25, 2007, 25 : 293 - +
  • [45] The consistency of task-based authorization constraints in workflow systems
    Tan, KJ
    Crampton, J
    Gunter, CA
    17TH IEEE COMPUTER SECURITY FOUNDATIONS WORKSHOP, PROCEEDINGS, 2004, : 155 - 169
  • [46] Fast approximation algorithms for task-based runtime systems
    Beaumont, Olivier
    Eyraud-Dubois, Lionel
    Kumar, Suraj
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (17):
  • [47] The TaPaSCo Open-Source Toolflowfor the Automated Composition of Task-Based Parallel Reconfigurable Computing Systems
    Carsten Heinz
    Jaco Hofmann
    Jens Korinth
    Lukas Sommer
    Lukas Weber
    Andreas Koch
    Journal of Signal Processing Systems, 2021, 93 : 545 - 563
  • [48] A temporal decomposition method for identifying venous effects in task-based fMRI
    Kay, Kendrick
    Jamison, Keith W.
    Zhang, Ru-Yuan
    Ugurbil, Kamil
    NATURE METHODS, 2020, 17 (10) : 1033 - +
  • [49] Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime
    Alcides Fonseca
    Bruno Cabral
    João Rafael
    Ivo Correia
    International Journal of Parallel Programming, 2016, 44 : 1337 - 1358
  • [50] Managing Failures in Task-Based Parallel Workflows in Distributed Computing Environments
    Ejarque, Jorge
    Bertran, Marta
    Cid-Fuentes, Javier Alvarez
    Conejero, Javier
    Badia, Rosa M.
    EURO-PAR 2020: PARALLEL PROCESSING, 2020, 12247 : 411 - 425