Detection of Silent Data Corruption in Fault-Tolerant Distributed Systems on Board Spacecraft

被引:0
|
作者
Fayyaz, Muhammad [1 ]
Vladimirova, Tanya [1 ]
机构
[1] Univ Leicester, Dept Engn, Leicester LE1 7RH, Leics, England
关键词
silent data corruption; distributed; computing; on-board; spacecraft; symptom; fault detection; isolation and reconfiguration; FRAMEWORK;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper a novel distributed architecture for system level Fault Detection, Isolation and Recovery (FDIR) aimed at spacecraft applications is presented. The architecture reconfigures itself in the case of a failure for seamless adaptability and operation. Two new algorithms for detection of Silent Data Corruption (SDC) errors are proposed. A selective redundancy method is employed for transient SDC errors, while a distributed mechanism based upon a data signature value is employed for permanent SDC errors. Experimental results based on prototyping with Xilinx Zynq FPGAs are reported, which show that the proposed method is capable of detecting SDC faults in distributed nodes and tolerates node failures by migrating tasks to healthy nodes. Evaluation results show that the proposed SDC detection algorithms achieve very good fault coverage, while using much lower additional resources compared with physical redundancy.
引用
收藏
页码:202 / 209
页数:8
相关论文
共 50 条
  • [1] Survey and future directions of fault-tolerant distributed computing on board spacecraft
    Fayyaz, Muhammad
    Vladimirova, Tanya
    ADVANCES IN SPACE RESEARCH, 2016, 58 (11) : 2352 - 2375
  • [2] On fault-tolerant data replication in distributed systems
    Tenzekhti, F
    Day, K
    Ould-Khaoua, M
    MICROPROCESSORS AND MICROSYSTEMS, 2002, 26 (07) : 301 - 309
  • [3] Fault-Tolerant Architecture of Storage Device for On-board Spacecraft Control Systems
    Ryabtsev V.G.
    Volobuev S.V.
    Shubovich A.A.
    Russian Aeronautics, 2019, 62 (01): : 106 - 112
  • [4] A SYSTEMATIC FAULT-TOLERANT COMPUTATIONAL MODEL FOR BOTH CRASH FAILURES AND SILENT DATA CORRUPTION
    Cui, Xiaolong
    Hussain, Zaeem
    Znati, Taieb
    Melhem, Rami
    2018 21ST CONFERENCE ON INNOVATION IN CLOUDS, INTERNET AND NETWORKS AND WORKSHOPS (ICIN), 2018,
  • [5] A Fault-Tolerant Detection Fusion Strategy for Distributed Multisensor Systems
    Zhao, Shengli
    Zhou, Jie
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2016,
  • [6] UNDERSTANDING FAULT-TOLERANT DISTRIBUTED SYSTEMS
    CRISTIAN, F
    COMMUNICATIONS OF THE ACM, 1991, 34 (02) : 56 - 78
  • [7] Fault-tolerant Distributed Systems in Hardware
    Schmid, Stefan
    BULLETIN OF THE EUROPEAN ASSOCIATION FOR THEORETICAL COMPUTER SCIENCE, 2015, 2015 (116): : 111 - 153
  • [8] Synthesis of Fault-Tolerant Distributed Systems
    Dimitrova, Rayna
    Finkbeiner, Bernd
    AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, PROCEEDINGS, 2009, 5799 : 321 - 336
  • [9] Adaptive distributed and fault-tolerant systems
    Hiltunen, MA
    Schlichting, RD
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 1996, 11 (05): : 275 - 285
  • [10] Fault detection and fault-tolerant control of actuators and sensors in distributed parameter systems
    Mu, Wenying
    Wang, Junping
    Feng, Weiwei
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2017, 354 (08): : 3341 - 3363