Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems

被引:0
|
作者
Sukkari, Dalal [1 ]
Ltaief, Hatem [1 ]
Keyes, David [1 ]
Faverge, Mathieu [2 ]
机构
[1] King Abdullah Univ Sci & Technol, Extreme Comp Res Ctr, Jeddah 23955, Saudi Arabia
[2] Univ Bordeaux, CNRS, Inria, Bordeaux INP, F-33400 Talence, France
关键词
QR FACTORIZATION; ALGORITHMS;
D O I
10.1109/cluster.2019.8891024
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes how to leverage a task-based implementation of the polar decomposition on massively parallel systems using the PARSEC dynamic runtime system. Based on a formulation of the iterative QR Dynamically-Weighted Halley (QDWH) algorithm, our novel implementation reduces data traffic while exploiting high concurrency from the underlying hardware architecture. First, we replace the most time-consuming classical QR factorization phase with a new hierarchical variant, customized for the specific structure of the matrix during the QDWH iterations. The newly developed hierarchical QR for QDWH exploits not only the matrix structure, but also shortens the length of the critical path to maximize hardware occupancy. We then deploy PARSEC to seamlessly orchestrate, pipeline, and track the data dependencies of the various linear algebra building blocks involved during the iterative QDWH algorithm. PARSEC enables to overlap communications with computations thanks to its asynchronous scheduling of fine-grained computational tasks. It employs look-ahead techniques to further expose parallelism, while actively pursuing the critical path. In addition, we identify synergistic opportunities between the task-based QDWH algorithm and the PARSEC framework. We exploit them during the hierarchical QR factorization to enforce a locality-aware task execution. The latter feature permits to minimize the expensive inter-node communication, which represents one of the main bottlenecks for scaling up applications on challenging distributed-memory systems. We report numerical accuracy and performance results using well and ill-conditioned matrices. The benchmarking campaign reveals up to 2X performance speedup against the existing state-of-the-art implementation for the polar decomposition on 36,864 cores.
引用
收藏
页码:69 / 80
页数:12
相关论文
共 50 条
  • [21] Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime
    Pei, Yu
    Cao, Qinglei
    Bosilca, George
    Luszczek, Piotr
    Eijkhout, Victor
    Dongarra, Jack
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 721 - 729
  • [22] Task Allocation in a Massively Parallel System Using Finite Automata
    Singh, Zubair Khan Ravindra
    Sanwal, Sumit
    Gangwar, Arun
    Alam, Shabbir
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 619 - 625
  • [23] Enforcing Security Properties in Task-based Systems
    Irwin, Keith
    Yu, Ting
    Winsborough, William H.
    SACMAT'08: PROCEEDINGS OF THE 13TH ACM SYMPOSIUM ON ACCESS CONTROL MODELS AND TECHNOLOGIES, 2008, : 41 - 50
  • [24] A TASK-BASED METHODOLOGY FOR SPECIFYING EXPERT SYSTEMS
    YEN, J
    LEE, J
    IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1993, 8 (01): : 8 - 13
  • [25] Towards a Task-based Search and Recommender Systems
    Tolomei, Gabriele
    Orlando, Salvatore
    Silvestri, Fabrizio
    2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 333 - 336
  • [26] For the external evaluation of AT systems by task-based methods
    Blanchon, Herve
    Boitet, Christian
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2007, 48 (01): : 33 - 65
  • [27] Task-based Parallel Programming for Scalable Matrix Product Algorithms
    Agullo, Emmanuel
    Buttari, Alfredo
    Guermouche, Abdou
    Herrmann, Julien
    Jego, Antoine
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2023, 49 (02):
  • [28] Transparent Orchestration of Task-based Parallel Applications in Containers Platforms
    Ramon-Cortes, Cristian
    Serven, Albert
    Ejarque, Jorge
    Lezzi, Daniele
    Badia, Rosa M.
    JOURNAL OF GRID COMPUTING, 2018, 16 (01) : 137 - 160
  • [29] A Task-Based Distributed Parallel Sparsified Nested Dissection Algorithm
    Cambier, Leopold
    Darve, Eric
    PROCEEDINGS OF THE PLATFORM FOR ADVANCED SCIENTIFIC COMPUTING CONFERENCE (PASC '21), 2021,
  • [30] Transparent Orchestration of Task-based Parallel Applications in Containers Platforms
    Cristian Ramon-Cortes
    Albert Serven
    Jorge Ejarque
    Daniele Lezzi
    Rosa M. Badia
    Journal of Grid Computing, 2018, 16 : 137 - 160