Fault-Tolerant Parallel Execution of Workflows with Deadlines

被引:1
|
作者
Eitschberger, Patrick [1 ]
Keller, Joerg [1 ]
机构
[1] Fernuniv, Fac Math & Comp Sci, D-58084 Hagen, Germany
关键词
D O I
10.1109/PDP.2017.30
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Workflows of dependent tasks are a widespread model for parallel applications, often statically scheduled prior to application. Static schedules can tolerate processor failures due to permanent faults by placing duplicate tasks during the scheduling process. Schedules for workflows with deadlines can be extended to include frequency scaling information to optimize energy consumption. Frequency scaling can also be used in case of a fault to minimize its effects on the schedule makespan, however for the price of additional energy consumption. We investigate the interplay between these two parameters and quantify the energy increase to be expected in case of a fault and a given makespan increase. This knowledge enables the user to inform the scheduler about the makespan increase that is tolerable in case of a fault, where tolerable includes both the related performance aspects and the expected increase in energy. To achieve this, we model small taskgraphs from a benchmark suite as integer linear programs and determine with the help of a solver energy-optimal schedules for the fault-free case and for all possible fault positions with several levels of makespan increase. We present averages and distribution depending on makespan increase for a processor with hypothetical power profile. Additionally, we present two heuristics to modify task frequency settings in case of a fault, to restrict the makespan increase to a given value. Comparison with optimal frequency settings from the benchmark suite indicate that the heuristics only incur a small energy overhead.
引用
收藏
页码:78 / 84
页数:7
相关论文
共 50 条
  • [31] Adaptive Execution Assistance for Multiplexed Fault-Tolerant Chip Multiprocessors
    Subramanyan, Pramod
    Singh, Virendra
    Saluja, Kewal K.
    Larsson, Erik
    2011 IEEE 29TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2011, : 419 - 426
  • [32] An approach to fault-tolerant mobile agent execution in distributed systems
    Mohammadi, K.
    Hamidi, H.
    2005 1ST IEEE/IFIP INTERNATIONAL CONFERENCE IN CENTRAL ASIA ON INTERNET (ICI), 2005, : 159 - 163
  • [33] Modeling of fault-tolerant mobile agents execution in distributed systems
    Mohammadi, K
    Hamidi, H
    2005 SYSTEMS COMMUNICATIONS, PROCEEDINGS: ICW 2005, WIRELESS TECHNOLOGIES; ICHSN 2005, HIGH SPEED NETWORKS; ICMCS 2005, MULTIMEDIA COMMUNICATIONS SYSTEMS; SENET 2005, SENSOR NETWORKS, 2005, : 56 - 60
  • [34] Fault-Tolerant Query Execution over Distributed Bitmap Indices
    Burdick, Sam
    Risner, Jahrme
    Chiu, David
    Sawin, Jason
    2018 IEEE/ACM 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING APPLICATIONS AND TECHNOLOGIES (BDCAT), 2018, : 21 - 30
  • [35] A novel fault-tolerant execution model by using of mobile agents
    Qu, Wenyu
    Kitsuregawa, Masaru
    Hong Shen
    Shan, Zhiguang
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2009, 32 (02) : 423 - 432
  • [36] Energy-Efficient and Fault-Tolerant Distributed Mobile Execution
    Kwon, Young-Woo
    Tilevich, Eli
    2012 IEEE 32ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2012, : 586 - 595
  • [37] A structure-aware algorithm for fault-tolerant scheduling of scientific workflows
    Masoumi, Maryam
    Motallebi, Hassan
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (15): : 17348 - 17377
  • [38] A structure-aware algorithm for fault-tolerant scheduling of scientific workflows
    Maryam Masoumi
    Hassan Motallebi
    The Journal of Supercomputing, 2022, 78 : 17348 - 17377
  • [39] Fault-Tolerant Online Packet Scheduling on Parallel Channels
    Garncarek, Pawel
    Jurdzinski, Tomasz
    Lorys, Krzysztof
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 347 - 356
  • [40] Efficient Coding Schemes for Fault-Tolerant Parallel Filters
    Gao, Zhen
    Reviriego, Pedro
    Xu, Zhan
    Su, Xin
    Wang, Jing
    Antonio Maestro, Juan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2015, 62 (07) : 666 - 670