Graceful degradation in algorithm-based fault tolerant multiprocessor systems

被引:9
|
作者
Yajnik, S [1 ]
Jha, NK [1 ]
机构
[1] PRINCETON UNIV, DEPT ELECT ENGN, PRINCETON, NJ 08544 USA
关键词
algorithm-based fault tolerance; concurrent error detection; concurrent fault location; fault diagnosis; graceful degradation; transient faults;
D O I
10.1109/71.577256
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Algorithm-based fault tolerance (ABFT) is a technique which improves the reliability of a multiprocessor system by providing concurrent error detection and fault location capability to it. It encodes data at the system level and modifies the algorithm to operate on the encoded data in order to expose both transient and permanent faults in any processor. Work done till now in this area takes care of only the fault detection and location part of the problem. However, if spare processors are not available, then after a faulty processor has been located, the work initially assigned to it has to be mapped to some nonfaulty processors in the system in such a way that the fault tolerance capability of the system is still maintained with as small a degradation in performance as possible. In this paper, we propose an integrated deterministic solution to the above problem which combines concurrent error detection and fault location with graceful degradation. There exists no previous deterministic ABFT method for the design of general t-fault locating systems, even for the case of t = 1. We propose a general method for designing one-fault locating/s-fault detecting systems. We use an extended model for representing ABFT systems. This model considers the processors computing the checks to be a part of the ABFT system, so that faults in the check_computing processors can also be detected and located using a simple diagnosis algorithm, and the checks can be mapped to other nonfaulty processors in the system.
引用
收藏
页码:137 / 153
页数:17
相关论文
共 50 条
  • [31] A genetic algorithm-based method for optimizing the energy consumption and performance of multiprocessor systems
    Anju S. Pillai
    Kaumudi Singh
    Vijayalakshmi Saravanan
    Alagan Anpalagan
    Isaac Woungang
    Leonard Barolli
    Soft Computing, 2018, 22 : 3271 - 3285
  • [32] A genetic algorithm-based method for optimizing the energy consumption and performance of multiprocessor systems
    Pillai, Anju S.
    Singh, Kaumudi
    Saravanan, Vijayalakshmi
    Anpalagan, Alagan
    Woungang, Isaac
    Barolli, Leonard
    SOFT COMPUTING, 2018, 22 (10) : 3271 - 3285
  • [33] A Fault-tolerant Scheduling Algorithm Based on Grouping for Real-time Multiprocessor
    Yu, Xingbiao
    Zheng, Changwen
    Hu, Xiaohui
    Zhao, Junsuo
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 919 - 923
  • [34] ALGORITHM-BASED FAULT TOLERANCE FOR ADAPTIVE LEAST-SQUARES LATTICE FILTERING ON A HYPERCUBE MULTIPROCESSOR
    MUELLERTHUNS, RB
    MCFARLAND, D
    BANERJEE, P
    PROCEEDINGS OF THE 1989 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, VOL 3: ALGORITHMS AND APPLICATIONS, 1989, : 177 - 180
  • [35] AN IMPROVED HARDWARE IMPLEMENTATION OF THE FAULT-TOLERANT CLOCK SYNCHRONIZATION ALGORITHM FOR LARGE MULTIPROCESSOR SYSTEMS
    CHOI, BR
    KYU, HP
    KIM, M
    IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (03) : 404 - 407
  • [36] BOUNDS ON ALGORITHM-BASED FAULT TOLERANCE IN MULTIPLE PROCESSOR SYSTEMS
    BANERJEE, P
    ABRAHAM, JA
    IEEE TRANSACTIONS ON COMPUTERS, 1986, 35 (04) : 296 - 306
  • [37] DISTRIBUTED RECONFIGURATION STRATEGIES FOR FAULT-TOLERANT MULTIPROCESSOR SYSTEMS
    CLARKE, EM
    NIKOLAOU, CN
    IEEE TRANSACTIONS ON COMPUTERS, 1982, 31 (08) : 771 - 784
  • [38] COOPERATIVE DIAGNOSIS AND ROUTING IN FAULT-TOLERANT MULTIPROCESSOR SYSTEMS
    BLOUGH, DM
    WANG, HY
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 27 (02) : 205 - 211
  • [39] RELIABILITY AND FAULT-TOLERANT ISSUES OF MULTIPROCESSOR AND MULTICOMPUTER SYSTEMS
    DAS, CR
    BHUYAN, LN
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1987, 11 : 129 - 154
  • [40] Wavelet analysis and consensus algorithm-based fault-tolerant control for smart grids
    Han, Yunlong
    FRONTIERS IN ENERGY RESEARCH, 2023, 11