Software-based replication for fault tolerance

被引:143
|
作者
Guerraoui, R
Schiper, A
机构
[1] Federal Institute of Technology, Lausanne
[2] Department of Computer Science, EPEL, Operating Systems Laboratory
[3] Département d'Informatique, Ecl. Polytech. Federale de Lausanne
关键词
D O I
10.1109/2.585156
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Developers of early distributed systems took a simplistic approach to providing fault tolerance: They just used another copy of the same hardware as a backup. Later, others developed replication software to work on off-the-shelf hardware. Since neither of these methods is especially economical, a logical course is to take it one step further and eliminate the extra hardware altogether. Fully software-based replication relies on sophisticated techniques to keep track of server communications and ensure the consistency of information across several server replicas. How do yu know that each server shares the same view of the data or program semantics? What happens if a server replica crashes? How do you make sure that a system processes invocations in the correct order! These are all problems that a replication technique has to handle. The authors describe two fundamental techniques, primary backup and active replication, and illustrate how they handle these problems. At this point, both have advantages and disadvantages that depend on the application. The authors also propose that group communication provides a sufficient framework for implementing software-based replication. The concept of static and dynamic groups proves useful in thinking about how to implement replication techniques. Replication techniques can also use total-order and view-synchronous multicast primitives from group communication.
引用
收藏
页码:68 / +
相关论文
共 50 条
  • [1] Efficient Software-Based Fault Tolerance Approach on Multicore Platforms
    Mushtaq, Hamid
    Al-Ars, Zaid
    Bertels, Koen
    DESIGN, AUTOMATION & TEST IN EUROPE, 2013, : 921 - 926
  • [2] Tuning Software-based Fault-tolerance Techniques for Power Optimization
    Chielle, Eduardo
    Kastensmidt, Fernando Lima
    Cuenca-Asensi, Sergio
    2014 24TH INTERNATIONAL WORKSHOP ON POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION (PATMOS), 2014,
  • [3] Software-Based Hardware Fault Tolerance for Many-Core Architectures
    Wunderlich, Hans-Joachim
    IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE VLSI SYSTEMS, PROCEEDINGS, 2009, : 223 - 223
  • [5] Algorithm transformation methods to reduce the overhead of software-based fault tolerance techniques
    Azambuja, Jose Rodrigo
    Brown, Gustavo
    Kastensmidt, Fernanda Lima
    Carro, Luigi
    MICROELECTRONICS RELIABILITY, 2014, 54 (05) : 1050 - 1055
  • [6] Software-based fault tolerant amay
    Centre for Development of Advanced Computing, Kolkata
    IEEE Potentials, 2006, 1 (41-45):
  • [7] Dependability in Embedded Systems: A Survey of Fault Tolerance Methods and Software-Based Mitigation Techniques
    Solouki, Mohammadreza Amel
    Angizi, Shaahin
    Violante, Massimo
    IEEE ACCESS, 2024, 12 : 180939 - 180967
  • [8] A software-based fault injection tool (SOFIT)
    Avresky, DR
    Geoghegan, SJ
    Tapadiya, PK
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 1998, 13 (06): : 327 - 337
  • [9] Software-based fault injection tool (SOFIT)
    Boston Univ, Boston, United States
    Comput Syst Sci Eng, 6 (327-337):
  • [10] Towards energy-aware software-based fault tolerance in real-time systems
    Unsal, OS
    Koren, I
    Krishna, CM
    ISLPED'02: PROCEEDINGS OF THE 2002 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, 2002, : 124 - 129