POSTER: The Legio Fault Resilience Framework: Design and Rationale

被引:0
|
作者
Rocco, Roberto [1 ]
Palermo, Gianluca [1 ]
机构
[1] Politecn Milan, Milan, Italy
关键词
HPC; MPI; ULFM; Fault Tolerance; CHECKPOINT/RESTART;
D O I
10.1145/3587135.3592180
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The increasing size of HPC clusters makes fault management mandatory. The current MPI standard does not specify the behaviour after the incurrence of a fault, precluding any possible solution. In this work, we present Legio, a framework leveraging the ULFM extension functionalities to introduce fault resilience properties in MPI applications.
引用
收藏
页码:205 / 206
页数:2
相关论文
共 50 条
  • [31] Poster: RF Based Entropy Sources for Jamming Resilience
    Prakash, Jay
    Liu, Chenxi
    Quek, Tony Q. S.
    Lee, Jemin
    PROCEEDINGS OF THE 2019 CONFERENCE ON SECURITY AND PRIVACY IN WIRELESS AND MOBILE NETWORKS (WISEC '19), 2019, : 300 - 301
  • [32] Poster: Software Fault Localization as a Service (SFLaaS)
    Sarhan, Qusay Idrees
    Hassan, Hassan Bapeer
    Beszedes, Arpad
    2023 IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION, ICST, 2023, : 482 - 485
  • [33] Poster: Identification of Methods with Low Fault Risk
    Niedermayr, Rainer
    Roehm, Tobias
    Wagner, Stefan
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 390 - 391
  • [34] Poster: An Assessment Framework for Edge Applications
    Wagner, Martin
    Gedeon, Julien
    Skaisgiris, Karolis
    Brandherm, Florian
    Muehlhaeuser, Max
    2020 IEEE/ACM SYMPOSIUM ON EDGE COMPUTING (SEC 2020), 2020, : 184 - 186
  • [35] Rationale and design
    Bell, James
    DRUG AND ALCOHOL REVIEW, 2023, 42 : S20 - S20
  • [36] Poster: Adapter Framework for VANET Simulators
    Ebers, Sebastian
    Fischer, Stefan
    2014 IEEE VEHICULAR NETWORKING CONFERENCE (VNC), 2014,
  • [37] Poster: Predicting the Fault Revelation Utility of Mutants
    Chekam, Thierry Titcheu
    Papadakis, Mike
    Bissyande, Tegawende
    Le Traon, Yves
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 408 - 409
  • [38] A framework to introduce urban flood resilience into the design of flood control alternatives
    Rezende, Osvaldo Moura
    Ribeiro da Cruz de Franco, Anna Beatriz
    Beleno de Oliveira, Antonio Krishnamurti
    Pitzer Jacob, Ana Caroline
    Miguez, Marcelo Gomes
    JOURNAL OF HYDROLOGY, 2019, 576 : 478 - 493
  • [39] A Framework to Design Consumer-Centric Operational Strategies for Resilience Enhancement
    Poudel, Shiva
    Yu, Min Gyung
    Mukherjee, Monish
    Hanif, Sarmad
    Hardy, Trevor D.
    Reeve, Hayden M.
    IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2024, 60 (02) : 2332 - 2343
  • [40] A principle-based approach to the design of a graduate resilience curriculum framework
    van Kessel, Gisela
    Brewer, Margo
    Lane, Murray
    Cooper, Berni
    Naumann, Fiona
    HIGHER EDUCATION RESEARCH & DEVELOPMENT, 2022, 41 (04) : 1325 - 1339