Application-level fault tolerance as a complement to system-level fault tolerance

被引:14
|
作者
Haines, J [1 ]
Lakamraju, V [1 ]
Koren, I [1 ]
Krishna, CM [1 ]
机构
[1] Univ Massachusetts, Dept Elect & Comp Engn, Amherst, MA 01003 USA
来源
JOURNAL OF SUPERCOMPUTING | 2000年 / 16卷 / 01期
关键词
distributed real-time systems; fault tolerance; checkpointing; imprecise computation; target tracking; beam forming;
D O I
10.1023/A:1008181429693
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As multiprocessor systems become more complex, their reliability will need to increase as well. In this paper we propose a novel technique which is applicable to a wide variety of distributed real-time systems, especially those exhibiting data parallelism. System-level fault tolerance involves reliability techniques incorporated within the system hardware and software whereas application-level fault tolerance involves reliability techniques incorporated within the application software. We assert that, for high reliability, a combination of system-level fault tolerance and application-level fault tolerance works best. In many systems, application-level fault tolerance can be used to bridge the gap when system-level fault tolerance alone does not provide the required reliability. We exemplify this with the RTHT target tracking benchmark and the ABF beamforming benchmark.
引用
收藏
页码:53 / 68
页数:16
相关论文
共 50 条
  • [1] Application-level fault tolerance as a complement to system-level fault tolerance
    Haines, Joshua
    Lakamraju, Vijay
    Koren, Israel
    Krishna, C. Mani
    Journal of Supercomputing, 2000, 16 (1-2): : 53 - 68
  • [2] Application-Level Fault Tolerance as a Complement to System-Level Fault Tolerance
    Joshua Haines
    Vijay Lakamraju
    Israel Koren
    C. Mani Krishna
    The Journal of Supercomputing, 2000, 16 : 53 - 68
  • [3] A survey of linguistic structures for application-level fault tolerance
    De Florio, Vincenzo
    Blondia, Chris
    ACM COMPUTING SURVEYS, 2008, 40 (02)
  • [4] Application-level correctness and its impact on fault tolerance
    Li, Xuanhua
    Yeung, Donald
    THIRTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2007, : 181 - +
  • [5] Application-level fault tolerance in the orbital thermal imaging spectrometer
    Ciocca, E
    Koren, I
    Koren, Z
    Krishna, CM
    Katz, DS
    10TH IEEE PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2004, : 43 - 48
  • [6] A Comparison of Application-Level Fault Tolerance Schemes for Task Pools
    Posner, Jonas
    Reitz, Lukas
    Fohry, Claudia
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 105 : 119 - 134
  • [7] Application-Level Fault-Tolerance Solutions for Grid Computing
    Diaz, Daniel
    Pardo, Xoan C.
    Martin, Maria J.
    Gonzalez, Patricia
    CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 554 - 559
  • [8] Application and System-Level Software Fault Tolerance Through Full System Restarts
    Abdi, Fardin
    Tabish, Rohan
    Rungger, Matthias
    Zamani, Majid
    Caccamo, Marco
    2017 ACM/IEEE 8TH INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS), 2017, : 197 - 206
  • [9] Bungie: Improving Fault Tolerance via Extensible Application-Level Protocols
    Christie, Samuel H., V
    Chopra, Amit Khushwant
    Singh, Munindar P.
    COMPUTER, 2021, 54 (05) : 44 - 53
  • [10] Application-Level Fault Tolerance in Real-Time Embedded Systems
    Afonso, Francisco
    Silva, Carlos
    Tavares, Adriano
    Montenegro, Sergio
    2008 INTERNATIONAL SYMPOSIUM ON INDUSTRIAL EMBEDDED SYSTEMS, 2008, : 126 - +