RELIABILITY-ANALYSIS FOR THE EXECUTION OF REMOTE JOBS IN A WORKSTATION-BASED ENVIRONMENT

被引:0
|
作者
YANG, CQ [1 ]
QU, YS [1 ]
机构
[1] IBM CORP,INFORMAT TECHNOL GRP,AUSTIN,TX 78758
来源
关键词
DISTRIBUTED ENVIRONMENT; REMOTE EXECUTION; FAULT-TOLERANCE; RELIABILITY; CHECKPOINTING;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many workstation-based distributed systems allow programs to be executed on remote machines for effective utilization of system resources. Usually, the control policies in these systems force a remote job be discontinued by the arrival of local jobs to guarantee the autonomy of each workstation. Therefore, one special concern in the design of such systems is tile fault-tolerant aspects for the execution of remote jobs. In this paper, we discuss two control policies of workstation-based distributed systems, the checkpointing and non-checkpointing policy, which support fault-tolerant execution of remote jobs on idling workstations. An analytical analysis of the reliability and mean turnaround time of the execution of remote jobs are conducted for both control policies. The optimal time interval between checkpoints in the checkpointing policy is formulated based on the given reliability and overhead of the system. In addition, several sample results derived from these analyses are compared with the outcome of corresponding simulation programs. Some observations of fault-tolerant features of each control policy are then presented as guidelines for the future development of such workstation-based distributed systems.
引用
收藏
页码:120 / 128
页数:9
相关论文
共 50 条