Fault-tolerant parallel scheduling of tasks on a heterogeneous high-performance workstation cluster

被引:2
|
作者
Kwok, YK [1 ]
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Hong Kong, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2001年 / 19卷 / 03期
关键词
parallel algorithms; cluster computing; heterogeneous systems; fault-tolerant scheduler; task graphs; neighborhood search;
D O I
10.1023/A:1011186732749
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster.
引用
收藏
页码:299 / 314
页数:16
相关论文
共 50 条
  • [1] Fault-Tolerant Parallel Scheduling of Tasks on a Heterogeneous High-Performance Workstation Cluster
    Yu-Kwong Kwok
    The Journal of Supercomputing, 2001, 19 : 299 - 314
  • [2] Fault-tolerant scheduling based on periodic tasks for heterogeneous systems
    Luo, Wei
    Yang, Fumin
    Pang, Liping
    Qin, Xiao
    AUTONOMIC AND TRUSTED COMPUTING, PROCEEDINGS, 2006, 4158 : 571 - 580
  • [3] Fault-Tolerant Scheduling of Real-Time Tasks on Heterogeneous Systems
    Wei, Mengxue
    Liu, Jing
    Li, Tao
    Xu, Xin
    Hu, Wei
    Zhao, Di
    PROCEEDINGS OF THE 2017 12TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2017, : 1006 - 1011
  • [4] Cluster delegation: High-performance, fault-tolerant data sharing in NFS
    Batsakis, A
    Burns, R
    14TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, 2005, : 100 - 109
  • [5] Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems
    Benoit, Anne
    Hakem, Mourad
    Robert, Yves
    PARALLEL COMPUTING, 2009, 35 (02) : 83 - 108
  • [6] Fault-tolerant high-performance cordic processors
    Kwak, JH
    Piuri, V
    Swartzlander, EE
    IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2000, : 164 - 172
  • [7] An efficient fault-tolerant scheduling algorithm for precedence constrained tasks in heterogeneous distributed systems
    Nakechbandi, M.
    Colin, J. -Y.
    Gashumba, J. B.
    INNOVATIONS AND ADVANCED TECHNIQUES IN COMPUTER AND INFORMATION SCIENCES AND ENGINEERING, 2007, : 301 - 307
  • [8] Fault-Tolerant Scheduling for Periodic Tasks based on DVFS
    Zhu, Ping
    Yang, Fumin
    Tu, Gang
    Luo, Wei
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 2186 - 2191
  • [9] Fault-tolerant scheduling of independent tasks in computational grid
    Zheng, Qin
    Veeravalli, Bharadwaj
    Tham, Chen-Khong
    2006 10TH IEEE SINGAPORE INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2006, : 102 - +
  • [10] A HIGH-PERFORMANCE FAULT-TOLERANT SWITCHING NETWORK FOR ATM
    LIN, JF
    WANG, SD
    IEICE TRANSACTIONS ON COMMUNICATIONS, 1995, E78B (11) : 1518 - 1528