Using Dynamic Broadcasts to Improve Task-Based Runtime Performances

被引:6
|
作者
Denis, Alexandre [1 ,2 ]
Jeannot, Emmanuel [1 ,2 ]
Swartvagher, Philippe [1 ,2 ]
Thibault, Samuel [1 ,2 ]
机构
[1] Inria Bordeaux Sud Ouest, F-33405 Talence, France
[2] Univ Bordeaux, LaBRI, F-33405 Talence, France
来源
关键词
Task-based runtime systems; Communications; Collective; Broadcast;
D O I
10.1007/978-3-030-57675-2_28
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Task-based runtimes have emerged in the HPC world to take benefit from the computation power of heterogeneous supercomputers and to achieve scalability. One of the main bottlenecks for scalability is the communication layer. Some task-based algorithms need to send the same data to multiple nodes. To optimize this communication pattern, libraries propose dedicated routines, such as MPI_Bcast. However, MPI_Bcast requirements do not fit well with the constraints of task-based runtime systems: it must be performed simultaneously by all involved nodes, and these must know each other, which is not possible when each node runs a task scheduler not synchronized with others. In this paper, we propose a new approach, called dynamic broadcasts to overcome these constraints. The broadcast communication pattern required by the task-based algorithm is detected automatically, then the broadcasting algorithm relies on active messages and source routing, so that participating nodes do not need to know each other and do not need to synchronize. Receiver receives data the same way as it receives point-to-point communication, without having to know it arrives through a broadcast. We have implemented the algorithm in the STARPU runtime system using the NEWMADELEINE communication library. We performed benchmarks using the CHOLESKY factorization that is known to use broadcasts and observed up to 30% improvement of its total execution time.
引用
收藏
页码:443 / 457
页数:15
相关论文
共 50 条
  • [1] Design for a Soft Error Resilient Dynamic Task-based Runtime
    Cao, Chongxiao
    Herault, Thomas
    Bosilca, George
    Dongarra, Jack
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 765 - 774
  • [2] Evaluating Dynamic Task Scheduling in a Task-Based Runtime System for Heterogeneous Architectures
    Becker, Thomas
    Karl, Wolfgang
    Schuele, Tobias
    ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2019, 2019, 11479 : 142 - 155
  • [3] A Hardware Runtime for Task-Based Programming Models
    Tan, Xubin
    Bosch, Jaume
    Alvarez, Carlos
    Jimenez-Gonzalez, Daniel
    Ayguade, Eduard
    Valero, Mateo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (09) : 1932 - 1946
  • [4] Assembly Operations for Multicore Architectures Using Task-Based Runtime Systems
    Genet, Damien
    Guermouche, Abdou
    Bosilca, George
    EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT II, 2014, 8806 : 338 - 350
  • [5] Adaptive scheduling of collocated applications using a task-based runtime system
    Dokulil, Jiri
    Benkner, Siegfried
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 41 - 48
  • [6] Mitigating the NUMA effect on task-based runtime systems
    Maronas, Marcos
    Navarro, Antoni
    Ayguade, Eduard
    Beltran, Vicenc
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (13): : 14287 - 14312
  • [7] Mitigating the NUMA effect on task-based runtime systems
    Marcos Maroñas
    Antoni Navarro
    Eduard Ayguadé
    Vicenç Beltran
    The Journal of Supercomputing, 2023, 79 : 14287 - 14312
  • [8] Implementing the Broadcast Operation in a Distributed Task-based Runtime
    Ceccato, Rodrigo
    Yviquel, Herve
    Pereira, Marcio
    Souza, Alan
    Araujo, Guido
    2022 IEEE 34TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW 2022), 2022, : 25 - 32
  • [9] Fast approximation algorithms for task-based runtime systems
    Beaumont, Olivier
    Eyraud-Dubois, Lionel
    Kumar, Suraj
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (17):
  • [10] Flexible Data Redistribution in a Task-Based Runtime System
    Cao, Qinglei
    Bosilca, George
    Wu, Wei
    Zhong, Dong
    Bouteiller, Aurelien
    Dongarra, Jack
    2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020), 2020, : 221 - 225