POSTER: The Legio Fault Resilience Framework: Design and Rationale

被引:0
|
作者
Rocco, Roberto [1 ]
Palermo, Gianluca [1 ]
机构
[1] Politecn Milan, Milan, Italy
关键词
HPC; MPI; ULFM; Fault Tolerance; CHECKPOINT/RESTART;
D O I
10.1145/3587135.3592180
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The increasing size of HPC clusters makes fault management mandatory. The current MPI standard does not specify the behaviour after the incurrence of a fault, precluding any possible solution. In this work, we present Legio, a framework leveraging the ULFM extension functionalities to introduce fault resilience properties in MPI applications.
引用
收藏
页码:205 / 206
页数:2
相关论文
共 50 条
  • [21] Redesigning the rationale for design rationale
    Atwood, Michael E.
    Horner, John
    HUMAN-COMPUTER INTERACTION, PT 1, PROCEEDINGS: INTERACTION DESIGN AND USABILITY, 2007, 4550 : 11 - +
  • [22] Risk and resilience in bipolar disorder: rationale and design of the Vulnerability to Bipolar Disorders Study (VIBES)
    Frangou, Sophia
    BIOCHEMICAL SOCIETY TRANSACTIONS, 2009, 37 : 1085 - 1089
  • [23] Fault tolerance and fault rectification design system and its framework for electromechanical products
    Fan, Shou-Wen
    Huang, Hong-Zhong
    Yang, Bo-Bo
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2007, 13 (07): : 1275 - 1281
  • [24] Conceptual framework and rationale
    Robinson, Alan S.
    Knols, Bart G. J.
    Voigt, Gabriella
    Hendrichs, Jorge
    MALARIA JOURNAL, 2009, 8
  • [25] Conceptual framework and rationale
    Alan S Robinson
    Bart GJ Knols
    Gabriella Voigt
    Jorge Hendrichs
    Malaria Journal, 8
  • [26] Fault-tolerant CAM architectures: A design framework
    Salice, F
    Sami, MG
    Stefanelli, R
    17TH IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2002, : 233 - 241
  • [27] An imprecise computation framework for fault tolerant control design
    Rios-Bolivar, Addison
    Heraoui, Margarita
    Parraguez, Luis
    Anato, Julima
    Hidrobo, Francisco
    Rivas, Francklin
    WSEAS Transactions on Computers, 2009, 8 (07): : 1093 - 1102
  • [28] Creating a New IT Management Framework Using Design Science A Rationale for Action and for Using Design Science
    Curley, Martin
    Kenneally, Jim
    Dreischmeier, Ralf
    PRACTICAL ASPECTS OF DESIGN SCIENCE, 2012, 286 : 96 - +
  • [29] POSTER: PenJ1939: An Interactive Framework for Design and Dissemination of Exploits for Commercial Vehicles
    Mukherjee, Subhojeet
    Cain, Noah
    Walker, Jacob
    White, David
    Ray, Indrajit
    Ray, Indrakshi
    CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2017, : 2559 - 2561
  • [30] RATIONALE AND DESIGN
    BORHANI, NO
    APPLEGATE, WB
    CUTLER, JA
    DAVIS, BR
    FURBERG, CD
    LAKATOS, E
    PAGE, L
    PERRY, HM
    SMITH, WM
    PROBSTFIELD, JL
    HYPERTENSION, 1991, 17 (03) : 2 - 15