Fault-tolerant parallel scheduling of tasks on a heterogeneous high-performance workstation cluster

被引:2
|
作者
Kwok, YK [1 ]
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Hong Kong, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2001年 / 19卷 / 03期
关键词
parallel algorithms; cluster computing; heterogeneous systems; fault-tolerant scheduler; task graphs; neighborhood search;
D O I
10.1023/A:1011186732749
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster.
引用
收藏
页码:299 / 314
页数:16
相关论文
共 50 条
  • [21] Fault-tolerant high-performance matrix multiplication:: Theory and practice
    Gunnels, JA
    Katz, DS
    Quintana-Ortí, ES
    van de Geijn, RA
    INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2001, : 47 - 56
  • [22] A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems
    Qin, Xiao
    Jiang, Hong
    PARALLEL COMPUTING, 2006, 32 (5-6) : 331 - 356
  • [23] An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems
    Qin, X
    Jiang, H
    Swanson, DR
    2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDING, 2002, : 360 - 368
  • [24] Parallel Simulation of Tasks Scheduling and Scheduling Criteria in High-performance Computing Systems
    Skrinarova, Jarmila
    Povinsky, Michal
    JOURNAL OF INFORMATION AND ORGANIZATIONAL SCIENCES, 2019, 43 (02) : 211 - 228
  • [25] Fault-tolerant scheduling of fine-grained tasks in grid environments
    Wrzesinska, G
    van Nieuwpoort, RV
    Maassen, J
    Kielmann, T
    Bal, HE
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2006, 20 (01): : 103 - 114
  • [26] Energy-Efficient Fault-Tolerant Scheduling of Reliable Parallel Applications on Heterogeneous Distributed Embedded Systems
    Xie, Guoqi
    Chen, Yuekun
    Xiao, Xiongren
    Xu, Cheng
    Li, Renfa
    Li, Keqin
    IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2018, 3 (03): : 167 - 181
  • [27] A FAULT-TOLERANT SCHEDULING PROBLEM
    LIESTMAN, AL
    CAMPBELL, RH
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1986, 12 (11) : 1089 - 1095
  • [28] HIGH-PERFORMANCE FAULT-TOLERANT VLSI SYSTEMS USING MICRO ROLLBACK
    TAMIR, Y
    TREMBLAY, M
    IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (04) : 548 - 554
  • [29] TOFF-2: A high-performance fault-tolerant file service
    Chin, CC
    Tsai, SR
    JOURNAL OF SYSTEMS AND SOFTWARE, 2000, 53 (02) : 173 - 182
  • [30] Scalable, fault-tolerant job step management for high-performance systems
    Solt, D.
    Hursey, J.
    Lauria, A.
    Guo, D.
    Guo, X.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2020, 64 (3-4) : 3 - 4