Software-based replication for fault tolerance

被引:143
|
作者
Guerraoui, R
Schiper, A
机构
[1] Federal Institute of Technology, Lausanne
[2] Department of Computer Science, EPEL, Operating Systems Laboratory
[3] Département d'Informatique, Ecl. Polytech. Federale de Lausanne
关键词
D O I
10.1109/2.585156
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Developers of early distributed systems took a simplistic approach to providing fault tolerance: They just used another copy of the same hardware as a backup. Later, others developed replication software to work on off-the-shelf hardware. Since neither of these methods is especially economical, a logical course is to take it one step further and eliminate the extra hardware altogether. Fully software-based replication relies on sophisticated techniques to keep track of server communications and ensure the consistency of information across several server replicas. How do yu know that each server shares the same view of the data or program semantics? What happens if a server replica crashes? How do you make sure that a system processes invocations in the correct order! These are all problems that a replication technique has to handle. The authors describe two fundamental techniques, primary backup and active replication, and illustrate how they handle these problems. At this point, both have advantages and disadvantages that depend on the application. The authors also propose that group communication provides a sufficient framework for implementing software-based replication. The concept of static and dynamic groups proves useful in thinking about how to implement replication techniques. Replication techniques can also use total-order and view-synchronous multicast primitives from group communication.
引用
收藏
页码:68 / +
相关论文
共 50 条
  • [31] Assessing the reliability of diverse fault-tolerant software-based systems
    Littlewood, B
    Popov, P
    Strigini, L
    SAFETY SCIENCE, 2002, 40 (09) : 781 - 796
  • [32] Software-Based Studios
    Vermost W.
    SMPTE Motion Imaging Journal, 2022, 131 (05): : 16 - 22
  • [33] Software-based innovation
    Quinn, JB
    Baruch, JJ
    Zien, KA
    SLOAN MANAGEMENT REVIEW, 1996, 37 (04): : 11 - 24
  • [34] Approaches to Software Based Fault Tolerance A Review
    Saha, Goutam Kumar
    COMPUTER SCIENCE JOURNAL OF MOLDOVA, 2005, 13 (02) : 193 - 231
  • [35] Replication-Based Fault Tolerance for MPI Applications
    Walters, John Paul
    Chaudhary, Vipin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009, 20 (07) : 997 - 1010
  • [36] A Fault-Tolerant Programmable Voter for Software-Based IT N IT Modular Redundancy
    Yim, Keun Soo
    Sidea, Valentin
    Kalbarczyk, Zbigniew
    Chen, Deming
    Iyer, Ravishankar K.
    2012 IEEE AEROSPACE CONFERENCE, 2012,
  • [37] Reducing Overheads in Software-based Fault Tolerant Systems using Approximate Computing
    Aponte-Moreno, Alexander
    Pedraza, Cesar
    Restrepo-Calle, Felipe
    2019 20TH IEEE LATIN AMERICAN TEST SYMPOSIUM (LATS), 2019,
  • [38] A Software-based Redundant Execution Programming Model for Transient Fault Detection and Correction
    Chen, Yi-Shen
    Chen, Peng-Sheng
    PROCEEDINGS OF 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2016), 2016, : 66 - 71
  • [39] Comparing Software-Based Fault Detection Techniques Applied at Different Abstraction Levels
    Chielle, Eduardo
    Grehs, Daniel Henrique
    Azambuja, Jose Rodrigo
    Kastensmidt, Fernanda Lima
    2013 14TH EUROPEAN CONFERENCE ON RADIATION AND ITS EFFECTS ON COMPONENTS AND SYSTEMS (RADECS), 2013,
  • [40] Function and Software-based Mobility
    Grebe, Uwe Dieter
    Meister, Gerhard
    Riedler, Stefan
    Conti, Andrea
    ATZ worldwide, 2023, 125 (7-8) : 82 - 87