Graceful degradation in algorithm-based fault tolerant multiprocessor systems

被引：9

作者：

Yajnik, S ^{[1
]}

Jha, NK ^{[1
]}

机构：

[1] PRINCETON UNIV, DEPT ELECT ENGN, PRINCETON, NJ 08544 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 1997年 / 8卷 / 02期

关键词：

algorithm-based fault tolerance; concurrent error detection; concurrent fault location; fault diagnosis; graceful degradation; transient faults;

D O I：

10.1109/71.577256

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Algorithm-based fault tolerance (ABFT) is a technique which improves the reliability of a multiprocessor system by providing concurrent error detection and fault location capability to it. It encodes data at the system level and modifies the algorithm to operate on the encoded data in order to expose both transient and permanent faults in any processor. Work done till now in this area takes care of only the fault detection and location part of the problem. However, if spare processors are not available, then after a faulty processor has been located, the work initially assigned to it has to be mapped to some nonfaulty processors in the system in such a way that the fault tolerance capability of the system is still maintained with as small a degradation in performance as possible. In this paper, we propose an integrated deterministic solution to the above problem which combines concurrent error detection and fault location with graceful degradation. There exists no previous deterministic ABFT method for the design of general t-fault locating systems, even for the case of t = 1. We propose a general method for designing one-fault locating/s-fault detecting systems. We use an extended model for representing ABFT systems. This model considers the processors computing the checks to be a part of the ABFT system, so that faults in the check_computing processors can also be detected and located using a simple diagnosis algorithm, and the checks can be mapped to other nonfaulty processors in the system.

引用

页码：137 / 153

页数：17

共 50 条

[41] Algorithm-based fault tolerant systolic evaluation of polynomials and exponentials of polynomials for equispaced arguments
Murthy, CSR
COMPUTERS & ELECTRICAL ENGINEERING, 1997, 23 (01) : 1 - 13
[42] A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism
Yao, Erlin
Wang, Rui
Chen, Mingyu
Tan, Guangming
Sun, Ninghui
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 438 - 448
[43] A MATHEMATICAL FRAMEWORK FOR ALGORITHM-BASED FAULT-TOLERANT COMPUTING OVER A RING OF INTEGERS
KRISHNA, H
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 1994, 13 (05) : 625 - 653
[44] Multiprocessor-based fault-tolerant real-time task scheduling algorithm
Zhang, Yongjun
Zhang, Yi
Peng, Yuxing
Chen, Fujie
1600, Sci Press (37):
[45] Graceful Degradation of Low-Criticality Tasks in Multiprocessor Dual-Criticality Systems
Huang, Lin
Hou, I-Hong
Sapatnekar, Sachin S.
Hu, Jiang
PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON REAL-TIME NETWORKS AND SYSTEMS (RTNS 2018), 2018,
[46] A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis
Manimaran, G
Murthy, CSR
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1998, 9 (11) : 1137 - 1152
[47] A Novel Intelligent Algorithm for Fault-Tolerant Task Scheduling in Real-Time Multiprocessor Systems
Zarinzad, Golbarg
Rahmani, Amir Masoud
Dayhim, Nikta
Third 2008 International Conference on Convergence and Hybrid Information Technology, Vol 2, Proceedings, 2008, : 816 - 821
[48] Algorithm-based fault tolerance: a review
Vijay, M
Mittal, R
MICROPROCESSORS AND MICROSYSTEMS, 1997, 21 (03) : 151 - 161
[49] BOUNDS ON ALGORITHM-BASED FAULT TOLERANCE IN MULTIPLE PROCESSOR SYSTEMS.
Banerjee, Prithviraj
Abraham, Jacob A.
IEEE Transactions on Computers, 1986, C-35 (04) : 296 - 306
[50] Algorithm based fault tolerant state estimation of power systems
Mishra, A
Mili, L
Phadke, AG
2004 INTERNATIONAL CONFERENCE ON PROBABILISTIC METHODS APPLIED TO POWER SYSTEMS, 2004, : 174 - 179

← 1 2 3 4 5 →