Checkpointing and rollback recovery for network of workstations

被引:0
|
作者
Wang, DS [1 ]
Zheng, WM [1 ]
Wang, KX [1 ]
Shen, MM [1 ]
机构
[1] Tsing Hua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
关键词
checkpointing; rollback recovery; network of workstations (NOW); domino effect; coordinated check-pointing;
D O I
10.1007/BF02917117
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Network of workstations (NOW) now becomes one of the main trends of parallel computing. But for long-running scientific programs, it needs effective fault tolerance for its changing property. Checkpointing and rollback recovery is a solution to this problem. First the main problems upon rollback recovery are discussed, the different checkpointing techniques for NOW are analyzed, and then the design and implementation of ChaRM (checkpoint-based rollback recovery and process migration) system are described. The comparison of three coordinated checkpointing systems is given.
引用
收藏
页码:207 / 214
页数:8
相关论文
共 50 条
  • [11] A LOW OVERHEAD CHECKPOINTING AND ROLLBACK RECOVERY SCHEME FOR DISTRIBUTED SYSTEMS
    TONG, ZJ
    KAIN, RY
    TSAI, WT
    PROCEEDINGS OF THE EIGHTH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, 1989, : 12 - 20
  • [12] COMPUTING OPTIMAL CHECKPOINTING STRATEGIES FOR ROLLBACK AND RECOVERY-SYSTEMS
    LECUYER, P
    MALENFANT, J
    IEEE TRANSACTIONS ON COMPUTERS, 1988, 37 (04) : 491 - 496
  • [13] Transparent checkpointing and rollback recovery mechanism for windows NT applications
    Zhang, Youhui
    Wang, Dongsheng
    Zheng, Weimin
    Operating Systems Review (ACM), 2001, 35 (02): : 78 - 85
  • [14] Message Efficient Checkpointing and Rollback Recovery in Heterogeneous Mobile Networks
    Jaggi P.K.
    Singh A.K.
    Journal of The Institution of Engineers (India): Series B, 2016, 97 (2) : 155 - 165
  • [15] A Dynamic Checkpointing and Rollback Recovery Solution Based on Task Switching
    Shao, Changheng
    Shao, Fengjing
    Song, Xiaoning
    Sun, Rencheng
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 354 - 358
  • [16] An Asymmetric Checkpointing and Rollback Error Recovery Scheme for Embedded Processors
    Tabkhi, Hamed
    Miremadi, Seyed Ghassem
    Ejlali, Alireza
    23RD IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT-TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2008, : 445 - 453
  • [17] OPTIMAL CHECKPOINTING AND LOCAL RECORDING FOR DOMINO-FREE ROLLBACK RECOVERY
    VENKATESH, K
    RADHAKRISHNAN, T
    LI, HF
    INFORMATION PROCESSING LETTERS, 1987, 25 (05) : 295 - 303
  • [18] A simulation study to analyze unreliable file systems with checkpointing and rollback recovery
    Dohi, T
    Nomura, K
    Kaio, N
    Osaki, S
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2000, E83A (05) : 804 - 811
  • [19] Winckp: a transparent checkpointing and rollback recovery tool for Windows NT applications
    Lucent Technologies
    Proc Annu Int Conf Fault Tolerant Comput, (220-223):
  • [20] USE OF COMMON TIME BASE FOR CHECKPOINTING AND ROLLBACK RECOVERY IN A DISTRIBUTED SYSTEM
    RAMANATHAN, P
    SHIN, KG
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1993, 19 (06) : 571 - 583