Software-based replication for fault tolerance

被引:143
|
作者
Guerraoui, R
Schiper, A
机构
[1] Federal Institute of Technology, Lausanne
[2] Department of Computer Science, EPEL, Operating Systems Laboratory
[3] Département d'Informatique, Ecl. Polytech. Federale de Lausanne
关键词
D O I
10.1109/2.585156
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Developers of early distributed systems took a simplistic approach to providing fault tolerance: They just used another copy of the same hardware as a backup. Later, others developed replication software to work on off-the-shelf hardware. Since neither of these methods is especially economical, a logical course is to take it one step further and eliminate the extra hardware altogether. Fully software-based replication relies on sophisticated techniques to keep track of server communications and ensure the consistency of information across several server replicas. How do yu know that each server shares the same view of the data or program semantics? What happens if a server replica crashes? How do you make sure that a system processes invocations in the correct order! These are all problems that a replication technique has to handle. The authors describe two fundamental techniques, primary backup and active replication, and illustrate how they handle these problems. At this point, both have advantages and disadvantages that depend on the application. The authors also propose that group communication provides a sufficient framework for implementing software-based replication. The concept of static and dynamic groups proves useful in thinking about how to implement replication techniques. Replication techniques can also use total-order and view-synchronous multicast primitives from group communication.
引用
收藏
页码:68 / +
相关论文
共 50 条
  • [41] Software fault tolerance for distributed object based computing
    Kim, HC
    Nair, VSS
    JOURNAL OF SYSTEMS AND SOFTWARE, 1997, 39 (02) : 103 - 117
  • [42] Software fault tolerance for distributed object based computing
    Southern Methodist Univ, Dallas, United States
    J Syst Software, 2 (103-117):
  • [43] Software-based microarchitectural attacks
    Gruss, Daniel
    IT-INFORMATION TECHNOLOGY, 2018, 60 (5-6): : 335 - 341
  • [44] Software-based diagnosis for processors
    Chen, L
    Dey, S
    39TH DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2002, 2002, : 259 - 262
  • [45] A Replication-Based Mechanism for Fault Tolerance in MapReduce Framework
    Liu, Yang
    Wei, Wei
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [46] Fault tolerance protocols for parallel programs based on tasks replication
    Aguilar, J
    Hernández, M
    8TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, PROCEEDINGS, 2000, : 397 - 404
  • [47] IOGuard: Software-Based I/O Page Fault Handling with One CPU Core
    Dong, Yiyuan
    Mi, Zeyu
    PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 337 - 346
  • [48] Fault-Independent Test-Generation for Software-Based Self-Testing
    Georgiou, Panagiotis
    Kavousianos, Xrysovalantis
    Cantoro, Riccardo
    Reorda, Matteo Sonza
    IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, 2019, 19 (02) : 341 - 349
  • [49] Fault-Independent Test-Generation for Software-Based Self-Testing
    Georgiou, Panagiotis
    Kavousianos, Xrysovalantis
    Cantoro, Riccardo
    Reorda, Matteo Sonza
    2018 IEEE 24TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS 2018), 2018, : 79 - 84
  • [50] The Study on Software Fault Tolerance
    Li, Liqing
    Lu, Hai
    Li, Xudong
    MATERIALS, MECHANICAL ENGINEERING AND MANUFACTURE, PTS 1-3, 2013, 268-270 : 1790 - +