Detection of Silent Data Corruption in Fault-Tolerant Distributed Systems on Board Spacecraft

被引:0
|
作者
Fayyaz, Muhammad [1 ]
Vladimirova, Tanya [1 ]
机构
[1] Univ Leicester, Dept Engn, Leicester LE1 7RH, Leics, England
关键词
silent data corruption; distributed; computing; on-board; spacecraft; symptom; fault detection; isolation and reconfiguration; FRAMEWORK;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper a novel distributed architecture for system level Fault Detection, Isolation and Recovery (FDIR) aimed at spacecraft applications is presented. The architecture reconfigures itself in the case of a failure for seamless adaptability and operation. Two new algorithms for detection of Silent Data Corruption (SDC) errors are proposed. A selective redundancy method is employed for transient SDC errors, while a distributed mechanism based upon a data signature value is employed for permanent SDC errors. Experimental results based on prototyping with Xilinx Zynq FPGAs are reported, which show that the proposed method is capable of detecting SDC faults in distributed nodes and tolerates node failures by migrating tasks to healthy nodes. Evaluation results show that the proposed SDC detection algorithms achieve very good fault coverage, while using much lower additional resources compared with physical redundancy.
引用
收藏
页码:202 / 209
页数:8
相关论文
共 50 条
  • [31] Distributed Fault Estimation and Fault-Tolerant Control of Interconnected Systems
    Zhang, Ke
    Jiang, Bin
    Chen, Mou
    Yan, Xing-Gang
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (03) : 1230 - 1240
  • [32] Fault-tolerant systems of the navigation attitude control communication and spacecraft
    Somov, YI
    Matrosov, VM
    Reshetnev, MF
    Kozlov, AG
    Rayevsky, VA
    Titov, GP
    AUTOMATIC CONTROL IN AEROSPACE 1998, 1999, : 123 - 128
  • [33] ARCHITECTURES FOR FAULT-TOLERANT SPACECRAFT COMPUTERS
    RENNELS, DA
    PROCEEDINGS OF THE IEEE, 1978, 66 (10) : 1255 - 1268
  • [34] Fault-Tolerant Control of Sampled-Data Nonlinear Distributed Parameter Systems
    Ghantasala, Sathyendra
    El-Farra, Nael H.
    2010 AMERICAN CONTROL CONFERENCE, 2010, : 5668 - 5673
  • [35] Spatial Data Locality in Scalable and Fault-tolerant Distributed Spatial Computing Systems
    Werner, Martin
    BIGSPATIAL 2018: PROCEEDINGS OF THE 7TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON ANALYTICS FOR BIG GEOSPATIAL DATA (BIGSPATIAL-2018), 2018, : 47 - 56
  • [36] LOW-COST MANAGEMENT OF REPLICATED DATA IN FAULT-TOLERANT DISTRIBUTED SYSTEMS
    JOSEPH, TA
    BIRMAN, KP
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1986, 4 (01): : 54 - 70
  • [37] Distributed fault-tolerant detection via sensor fault detection in sensor networks
    Wang, Tsang-Yi
    Chang, Li-Yuan
    Duh, Dyi-Rong
    Wu, Jeng-Yang
    2007 PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2007, : 248 - +
  • [38] On distributed fault-tolerant detection in wireless sensor networks
    Luo, XW
    Dong, M
    Huang, YL
    IEEE TRANSACTIONS ON COMPUTERS, 2006, 55 (01) : 58 - 70
  • [39] Fault-Tolerant Distributed Approach to Satellite On-Board Computer Design
    Fayyaz, Muhammad
    Vladimirova, Tanya
    2014 IEEE AEROSPACE CONFERENCE, 2014,
  • [40] SHORTSTACK: Distributed, Fault-tolerant, Oblivious Data Access
    Vuppalapati, Midhul
    Babel, Kushal
    Khandelwal, Anurag
    Agarwal, Rachit
    PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2022, 2022, : 719 - 734