Checkpointing and rollback recovery for network of workstations

被引:0
|
作者
Wang, DS [1 ]
Zheng, WM [1 ]
Wang, KX [1 ]
Shen, MM [1 ]
机构
[1] Tsing Hua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
来源
关键词
checkpointing; rollback recovery; network of workstations (NOW); domino effect; coordinated check-pointing;
D O I
10.1007/BF02917117
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Network of workstations (NOW) now becomes one of the main trends of parallel computing. But for long-running scientific programs, it needs effective fault tolerance for its changing property. Checkpointing and rollback recovery is a solution to this problem. First the main problems upon rollback recovery are discussed, the different checkpointing techniques for NOW are analyzed, and then the design and implementation of ChaRM (checkpoint-based rollback recovery and process migration) system are described. The comparison of three coordinated checkpointing systems is given.
引用
收藏
页码:207 / 214
页数:8
相关论文
共 50 条
  • [41] Session level rollback recovery
    Ciuffoletti, A
    ISCC 2002: SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2002, : 9 - 14
  • [42] LTE NFV Rollback Recovery
    Raza, Muhammad Taqi
    Tan, Zhowei
    Tufail, Ali
    Anwar, Fatima Muhammad
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (03): : 2468 - 2477
  • [43] Rollback-Recovery for Middleboxes
    Sherry, Justine
    Gao, Peter Xiang
    Basu, Soumya
    Panda, Aurojit
    Krishnamurthy, Arvind
    Maciocco, Christian
    Manesh, Maziar
    Martins, Joao
    Ratnasamy, Sylvia
    Rizzo, Luigi
    Shenker, Scott
    SIGCOMM'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2015, : 227 - 240
  • [44] Rollback-Recovery for Middleboxes
    Sherry, Justine
    Gao, Peter Xiang
    Basu, Soumya
    Panda, Aurojit
    Krishnamurthy, Arvind
    Maciocco, Christian
    Manesh, Maziar
    Martins, Joao
    Ratnasamy, Sylvia
    Rizzo, Luigi
    Shenker, Scott
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2015, 45 (04) : 227 - 240
  • [45] Transparent optimistic rollback recovery
    Johnson, David B.
    Zwaenepoel, Willy
    Operating Systems Review (ACM), 1991, 25 (02): : 99 - 102
  • [46] An efficient checkpointing protocol for the minimal characterization of operational rollback-dependency trackability
    Garcia, IC
    Buzato, LE
    23RD IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2004, : 126 - 135
  • [47] A communication-induced checkpointing protocol that ensures rollback-dependency trackability
    Baldoni, R
    Helary, JM
    Mostefaoui, A
    Raynal, M
    TWENTY-SEVENTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST OF PAPERS, 1997, : 68 - 77
  • [48] Non-blocking synchronous checkpointing based on rollback-dependency trackability
    Sakata, Tiemi C.
    Garcia, Islene C.
    SRDS 2006: 25TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2006, : 411 - 420
  • [49] A multi-cycle checkpointing protocol that ensures strict 1-rollback
    Ci, Yi-Wei
    Zhang, Zhan
    Zuo, De-Cheng
    Wu, Zhi-Bo
    Yang, Xiao-Zong
    INFORMATION PROCESSING LETTERS, 2012, 112 (20) : 788 - 793
  • [50] CHECKPOINTING AND RECOVERY IN A PIPELINE OF TRANSPUTERS
    SINHA, A
    DAS, PK
    CHAUDHURI, A
    MICROPROCESSING AND MICROPROGRAMMING, 1992, 35 (1-5): : 141 - 147