Detection of Silent Data Corruption in Fault-Tolerant Distributed Systems on Board Spacecraft

被引:0
|
作者
Fayyaz, Muhammad [1 ]
Vladimirova, Tanya [1 ]
机构
[1] Univ Leicester, Dept Engn, Leicester LE1 7RH, Leics, England
关键词
silent data corruption; distributed; computing; on-board; spacecraft; symptom; fault detection; isolation and reconfiguration; FRAMEWORK;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper a novel distributed architecture for system level Fault Detection, Isolation and Recovery (FDIR) aimed at spacecraft applications is presented. The architecture reconfigures itself in the case of a failure for seamless adaptability and operation. Two new algorithms for detection of Silent Data Corruption (SDC) errors are proposed. A selective redundancy method is employed for transient SDC errors, while a distributed mechanism based upon a data signature value is employed for permanent SDC errors. Experimental results based on prototyping with Xilinx Zynq FPGAs are reported, which show that the proposed method is capable of detecting SDC faults in distributed nodes and tolerates node failures by migrating tasks to healthy nodes. Evaluation results show that the proposed SDC detection algorithms achieve very good fault coverage, while using much lower additional resources compared with physical redundancy.
引用
收藏
页码:202 / 209
页数:8
相关论文
共 50 条
  • [41] REASONING ABOUT UNCERTAINTY IN FAULT-TOLERANT DISTRIBUTED SYSTEMS
    FISCHER, MJ
    ZUCK, LD
    LECTURE NOTES IN COMPUTER SCIENCE, 1988, 331 : 142 - 158
  • [42] Lazy verification in fault-tolerant distributed storage systems
    Abd-El-Malek, M
    Ganger, GR
    Goodson, GR
    Reiter, MK
    Wylie, JJ
    24TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2005, : 179 - 190
  • [43] Evaluation of fault-tolerant mobile agents in distributed systems
    Mohammadi, K.
    Hamidi, H.
    2005 1ST IEEE/IFIP INTERNATIONAL CONFERENCE IN CENTRAL ASIA ON INTERNET (ICI), 2005, : 169 - 173
  • [44] Fault-tolerant protocols for scalable distributed data structures
    Sapiecha, Krzysztof
    Lukawski, Grzegorz
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2006, 3911 : 1018 - 1025
  • [45] DMap: A fault-tolerant and scalable distributed data structure
    Benz, Samuel
    Pedone, Fernando
    2018 IEEE 37TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS), 2018, : 153 - 160
  • [46] Optimal fault-tolerant resource placement in distributed systems
    Kim, JH
    Seong, YR
    Kim, JM
    Lee, CH
    PDPTA'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-4, 2003, : 1272 - 1278
  • [47] Constraint logic programming for fault-tolerant distributed systems
    Creemers, T
    Riera, J
    Tourouta, EN
    JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 1998, 37 (05) : 689 - 698
  • [48] AN ADAPTIVE DEPENDABLE FAULT-TOLERANT SCHEME FOR DISTRIBUTED SYSTEMS
    Liu, Guoliang
    Chen, Shuyu
    THIRD INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY (ICCET 2011), 2011, : 697 - 702
  • [49] DESIGN OF FAULT-TOLERANT DISTRIBUTED CONTROL-SYSTEMS
    PIURI, V
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 1994, 43 (02) : 257 - 264
  • [50] BASIC CONCEPTS AND ISSUES IN FAULT-TOLERANT DISTRIBUTED SYSTEMS
    CRISTIAN, F
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 563 : 119 - 149