ALGORITHM-BASED FAULT TOLERANCE;
CHECKSUM ENCODING;
CONCURRENT ERROR DETECTION;
DEPENDENCE GRAPHS;
FAULT DETECTABILITY;
FAULT LOCATABILITY;
SYSTEM SYNTHESIS FOR FAULT TOLERANCE;
D O I:
10.1109/71.238622
中图分类号:
TP301 [理论、方法];
学科分类号:
081202 ;
摘要:
Algorithm-Based Fault Tolerance (ABFT) is a scheme to improve the reliability of parallel architectures used for computation-intensive tasks. The exact implementation of an ABFT scheme is algorithm-dependent. ABFT systems have very low overhead compared to other fault tolerance schemes with similar benefits. Few results are available in the area of general synthesis of ABFT systems. A two-stage approach to the synthesis of ABFT systems is proposed. In the first stage a system-level code is chosen to encode the data used in the algorithm. In the second stage the optimal architecture to implement the scheme is chosen using dependence graphs. Dependence graphs are a graph-theoretic form of algorithm representation. We demonstrate that not all architectures are ideal for the implementation of a particular ABFT scheme. We propose new measures to characterize the fault tolerance capability of a system to better exploit the proposed synthesis method. Dependence graphs can also be used for the synthesis of ABFT schemes for non-linear problems. An example of a fault-tolerant median filter is provided to illustrate their utility for such problems.
机构:
Islamic Azad Univ, Fac Engn, Dept Comp Engn, Arak, Iran
Univ Putra Malaysia, Fac Engn, Dept Comp & Commun Syst Engn, Serdang, MalaysiaIslamic Azad Univ, Fac Engn, Dept Comp Engn, Arak, Iran
Karimi, Abbas
Zarafshan, Faraneh
论文数: 0引用数: 0
h-index: 0
机构:
Islamic Azad Univ, Fac Engn, Dept Comp Engn, Arak, Iran
Univ Putra Malaysia, Fac Engn, Dept Comp & Commun Syst Engn, Serdang, MalaysiaIslamic Azad Univ, Fac Engn, Dept Comp Engn, Arak, Iran
Zarafshan, Faraneh
Jantan, Adznan B.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Putra Malaysia, Fac Engn, Dept Comp & Commun Syst Engn, Serdang, MalaysiaIslamic Azad Univ, Fac Engn, Dept Comp Engn, Arak, Iran