Checkpointing and rollback recovery for network of workstations

被引:0
|
作者
Wang, DS [1 ]
Zheng, WM [1 ]
Wang, KX [1 ]
Shen, MM [1 ]
机构
[1] Tsing Hua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
关键词
checkpointing; rollback recovery; network of workstations (NOW); domino effect; coordinated check-pointing;
D O I
10.1007/BF02917117
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Network of workstations (NOW) now becomes one of the main trends of parallel computing. But for long-running scientific programs, it needs effective fault tolerance for its changing property. Checkpointing and rollback recovery is a solution to this problem. First the main problems upon rollback recovery are discussed, the different checkpointing techniques for NOW are analyzed, and then the design and implementation of ChaRM (checkpoint-based rollback recovery and process migration) system are described. The comparison of three coordinated checkpointing systems is given.
引用
收藏
页码:207 / 214
页数:8
相关论文
共 50 条
  • [21] Winckp: a transparent checkpointing and rollback recovery tool for windows NT applications
    Chung, PE
    Lee, WJ
    Huang, YN
    Liang, DR
    Wang, CY
    TWENTY-NINTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST OF PAPERS, 1999, : 220 - 223
  • [22] Low Overhead Incremental Checkpointing and Rollback Recovery Scheme on Windows Operating System
    Chen, Chih-Ho
    Ting, Yung
    Heh, Jia-Sheng
    THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 268 - 271
  • [23] A retrial queue for modeling fault-tolerant systems with checkpointing and rollback recovery
    Dimitriou, Ioannis
    COMPUTERS & INDUSTRIAL ENGINEERING, 2015, 79 : 156 - 167
  • [24] Scalable Checkpointing-based Rollback Recovery Protocol For Geographically Distributed Systems
    Ahn, Jinho
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 1492 - 1496
  • [25] Low-overhead checkpointing and rollback-recovery in distributed computing systems
    Liu, Yunlong
    Chen, Junliang
    Jisuanji Xuebao/Chinese Journal of Computers, 1999, 22 (03): : 249 - 257
  • [26] Quantifying rollback propagation in distributed checkpointing
    Agbaria, A
    Attiya, H
    Friedman, R
    Vitenberg, R
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2004, 64 (03) : 370 - 384
  • [27] Quantifying rollback propagation in distributed checkpointing
    Agbaria, A
    Attiya, H
    Friedman, R
    Vitenberg, R
    20TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2001, : 36 - 45
  • [28] Checkpointing and rollback-recovery protocol integrated with VsSG protocol for RYW session guarantee
    Brzezinski, J
    Kobusinska, A
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING AND NETWORKS, 2006, : 174 - +
  • [29] Checkpointing for Peta-scale systems: A look into the future of practical rollback-recovery
    Elnozahy, EN
    Plank, JS
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2004, 1 (02) : 97 - 108
  • [30] Safety of checkpointing and rollback-recovery protocol for mobile systems with RYW session guarantee
    Brzezinski, Jerzy
    Kobusinska, Anna
    Kobusinski, Jacek
    ICEIS 2006: PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: SOFTWARE AGENTS AND INTERNET COMPUTING, 2006, : 118 - +