Checkpointing and rollback recovery for network of workstations

被引:0
|
作者
Wang, DS [1 ]
Zheng, WM [1 ]
Wang, KX [1 ]
Shen, MM [1 ]
机构
[1] Tsing Hua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
关键词
checkpointing; rollback recovery; network of workstations (NOW); domino effect; coordinated check-pointing;
D O I
10.1007/BF02917117
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Network of workstations (NOW) now becomes one of the main trends of parallel computing. But for long-running scientific programs, it needs effective fault tolerance for its changing property. Checkpointing and rollback recovery is a solution to this problem. First the main problems upon rollback recovery are discussed, the different checkpointing techniques for NOW are analyzed, and then the design and implementation of ChaRM (checkpoint-based rollback recovery and process migration) system are described. The comparison of three coordinated checkpointing systems is given.
引用
收藏
页码:207 / 214
页数:8
相关论文
共 50 条
  • [1] Checkpointing and rollback recovery for network of workstations
    汪东升
    郑纬民
    王鼎兴
    沈美明
    Science in China(Series E:Technological Sciences), 1999, (02) : 207 - 214
  • [2] Checkpointing and rollback recovery for network of workstations
    Dongsheng Wang
    Weimin Zheng
    Dingxing Wang
    Meiming Shen
    Science in China Series E: Technological Sciences, 1999, 42 : 207 - 214
  • [3] Estimating checkpointing, rollback and recovery overheads
    Mandal, PS
    Mukhopadhyaya, K
    DISTRIBUTED COMPUTING: IWDC 2003, 2003, 2918 : 56 - 65
  • [4] Concurrent checkpointing & rollback recovery for distributed systems
    Ye, X
    Keane, JA
    EUROSIM '96 - HPCN CHALLENGES IN TELECOMP AND TELECOM: PARALLEL SIMULATION OF COMPLEX SYSTEMS AND LARGE-SCALE APPLICATIONS, 1996, : 211 - 218
  • [5] CHECKPOINTING AND ROLLBACK-RECOVERY FOR DISTRIBUTED SYSTEMS
    KOO, R
    TOUEG, S
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1987, 13 (01) : 23 - 31
  • [6] Deadlocks in fully uncoordinated checkpointing rollback recovery systems
    Shah, V
    Sanyal, S
    Bhattacharya, S
    THIRD INTERNATIONAL WORKSHOP ON OBJECT-ORIENTED REAL-TIME DEPENDABLE SYSTEMS, PROCEEDINGS, 1997, : 190 - 197
  • [7] A checkpointing technique for rollback error recovery in embedded systems
    Bashiri, Mohsen
    Miremadi, Seyed Ghassem
    Fazeli, Mahdi
    2006 INTERNATIONAL CONFERENCE ON MICROELECTRONICS, 2007, : 174 - +
  • [8] CHECKPOINTING AND ROLLBACK-RECOVERY FOR DISTRIBUTED SYSTEMS.
    Koo, Richard
    Toueg, Sam
    IEEE Transactions on Software Engineering, 1987, SE-13 (01) : 23 - 31
  • [9] Optimal checkpointing interval of a communication system with rollback recovery
    Kimura, M
    Yasui, K
    Nakagawa, T
    Ishii, N
    MATHEMATICAL AND COMPUTER MODELLING, 2003, 38 (11-13) : 1303 - 1311
  • [10] CHECKPOINTING AND ROLLBACK-RECOVERY ALGORITHMS IN DISTRIBUTED SYSTEMS
    DENG, Y
    PARK, EK
    JOURNAL OF SYSTEMS AND SOFTWARE, 1994, 25 (01) : 59 - 71