Fault-Tolerant Parallel Execution of Workflows with Deadlines

被引:1
|
作者
Eitschberger, Patrick [1 ]
Keller, Joerg [1 ]
机构
[1] Fernuniv, Fac Math & Comp Sci, D-58084 Hagen, Germany
关键词
D O I
10.1109/PDP.2017.30
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Workflows of dependent tasks are a widespread model for parallel applications, often statically scheduled prior to application. Static schedules can tolerate processor failures due to permanent faults by placing duplicate tasks during the scheduling process. Schedules for workflows with deadlines can be extended to include frequency scaling information to optimize energy consumption. Frequency scaling can also be used in case of a fault to minimize its effects on the schedule makespan, however for the price of additional energy consumption. We investigate the interplay between these two parameters and quantify the energy increase to be expected in case of a fault and a given makespan increase. This knowledge enables the user to inform the scheduler about the makespan increase that is tolerable in case of a fault, where tolerable includes both the related performance aspects and the expected increase in energy. To achieve this, we model small taskgraphs from a benchmark suite as integer linear programs and determine with the help of a solver energy-optimal schedules for the fault-free case and for all possible fault positions with several levels of makespan increase. We present averages and distribution depending on makespan increase for a processor with hypothetical power profile. Additionally, we present two heuristics to modify task frequency settings in case of a fault, to restrict the makespan increase to a given value. Comparison with optimal frequency settings from the benchmark suite indicate that the heuristics only incur a small energy overhead.
引用
收藏
页码:78 / 84
页数:7
相关论文
共 50 条
  • [1] A Communication Framework for Fault-Tolerant Parallel Execution
    Kanna, Nagarajan
    Subhlok, Jaspal
    Gabriel, Edgar
    Rohit, Eshwar
    Anderson, David
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2010, 5898 : 1 - +
  • [2] Parallel algorithms for fault-tolerant mobile agent execution
    Yang, J
    Cao, JN
    Wu, WG
    Xu, CZ
    DISTRIBUTED AND PARALLEL COMPUTING, 2005, 3719 : 246 - 256
  • [3] A Framework for Adaptive Fault-Tolerant Execution of Workflows in the Grid: Empirical and Theoretical Analysis
    Felipe Pontes Guimaraes
    Pedro Célestin
    Daniel Macedo Batista
    Genaína Nunes Rodrigues
    Alba Cristina Magalhaes Alves de Melo
    Journal of Grid Computing, 2014, 12 : 127 - 151
  • [4] A Framework for Adaptive Fault-Tolerant Execution of Workflows in the Grid: Empirical and Theoretical Analysis
    Guimaraes, Felipe Pontes
    Celestin, Pedro
    Batista, Daniel Macedo
    Rodrigues, Genaina Nunes
    Magalhaes Alves de Melo, Alba Cristina
    JOURNAL OF GRID COMPUTING, 2014, 12 (01) : 127 - 151
  • [5] PERFORMANCE ANALYSIS OF FAULT-TOLERANT SYSTEMS IN PARALLEL EXECUTION OF CONVERSATIONS
    KIM, KH
    HEU, S
    YANG, SM
    IEEE TRANSACTIONS ON RELIABILITY, 1989, 38 (01) : 96 - 102
  • [6] Fault-tolerant mobile agent execution
    Pleisch, S
    Schiper, A
    IEEE TRANSACTIONS ON COMPUTERS, 2003, 52 (02) : 209 - 222
  • [7] Fault-tolerant execution of mobile agents
    Silva, LM
    Batista, V
    Silva, JG
    DSN 2000: INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2000, : 135 - 143
  • [8] An execution decoupled fault-tolerant processor
    Li, Hong-Bing
    Shang, Li-Hong
    Zhou, Mi
    Jin, Hui-Hua
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2009, 41 (SUPPL. 1): : 5 - 10
  • [9] Dynamic and Fault-Tolerant Clustering for Scientific Workflows
    Chen, Weiwei
    da Silva, Rafael Ferreira
    Deelman, Ewa
    Fahringer, Thomas
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2016, 4 (01) : 49 - 62
  • [10] Fault-tolerant and Transactional Stateful Serverless Workflows
    Zhang, Haoran
    Cardoza, Adney
    Chen, Peter Baile
    Angel, Sebastian
    Liu, Vincent
    PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), 2020, : 1187 - 1204