Checkpointing and rollback recovery for network of workstations

被引:0
|
作者
Wang, DS [1 ]
Zheng, WM [1 ]
Wang, KX [1 ]
Shen, MM [1 ]
机构
[1] Tsing Hua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
关键词
checkpointing; rollback recovery; network of workstations (NOW); domino effect; coordinated check-pointing;
D O I
10.1007/BF02917117
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Network of workstations (NOW) now becomes one of the main trends of parallel computing. But for long-running scientific programs, it needs effective fault tolerance for its changing property. Checkpointing and rollback recovery is a solution to this problem. First the main problems upon rollback recovery are discussed, the different checkpointing techniques for NOW are analyzed, and then the design and implementation of ChaRM (checkpoint-based rollback recovery and process migration) system are described. The comparison of three coordinated checkpointing systems is given.
引用
收藏
页码:207 / 214
页数:8
相关论文
共 50 条
  • [31] Diskless Checkpointing with Rollback-Dependency Trackability
    Menderico, Raphael Marcos
    Garcia, Islene Calciolari
    2010 29TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS SRDS 2010, 2010, : 275 - 281
  • [32] Analysis of a transaction system with checkpointing, failures, and rollback
    Kumar, L
    Misra, M
    Mitrani, I
    COMPUTER PERFORMANCE EVALUATION: MODELLING TECHNIQUES AND TOOLS, 2002, 2324 : 279 - 288
  • [33] Specification and Synthesis of Hardware Checkpointing and Rollback Mechanisms
    Chan, Carven
    Schwartz-Narbonne, Daniel
    Sethi, Divjyot
    Malik, Sharad
    2012 49TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2012, : 1222 - 1228
  • [34] Checkpointing and Rollback Recovery in Distributed Systems: Existing AV Solutions, Open Issues and Proposed Solutions
    Manivannan, D.
    NEW ASPECTS OF SYSTEMS, PTS I AND II, 2008, : 569 - +
  • [35] MPICH-GF: Transparent checkpointing and rollback-recovery for grid-enabled MPI processes
    Woo, N
    Jung, HS
    Yeom, HY
    Park, T
    Park, H
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (07): : 1820 - 1828
  • [36] Swich: A prototype for efficient cache-level checkpointing and rollback
    Teodorescu, Radu
    Nakano, Jun
    Torrellas, Josep
    IEEE MICRO, 2006, 26 (05) : 28 - 40
  • [37] NetCP: Consistent, Non-interruptive and Efficient Checkpointing and Rollback of SDN
    Yu, Ye
    Qian, Chen
    Wu, Wenfei
    Zhang, Ying
    2018 IEEE/ACM 26TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2018,
  • [38] An Efficient Checkpointing and Rollback Recovery Scheme for Cluster-based Multi-channel Ad-hoc Wireless Networks
    Men, Chaoguang
    Xu, Zhenpeng
    Li, Xiang
    PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS, 2008, : 371 - 378
  • [39] Survive: Pointer-Based In-DRAM Incremental Checkpointing for Low-Cost Data Persistence and Rollback-Recovery
    Mirhosseini, Amirhossein
    Agrawal, Aditya
    Torrellas, Josep
    IEEE COMPUTER ARCHITECTURE LETTERS, 2017, 16 (02) : 153 - 157
  • [40] Markov Chain-based Modeling and Analysis of Checkpointing with Rollback Recovery for Efficient DSE in Soft Real-time Systems
    Sahoo, Siva Satyendra
    Veeravalli, Bharadwaj
    Kumar, Akash
    2020 33RD IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFT), 2020,