A NEW APPROACH TO SYSTEM-LEVEL FAULT-TOLERANCE IN MESSAGE-PASSING MULTICOMPUTERS

被引:0
|
作者
ZIMMERMAN, GW
ESFAHANIAN, AH
机构
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The loop is a commonly used interconnection network for computer systems. In this paper we consider the problem of making a loop network fault-tolerant. Previous solutions employ the absolute minimum number of redundant components, for a specified level of fault tolerance. In our approach, "extra" redundancy is used to reduce the size and complexity of the interconnection network. Designs based on chordal rings are presented which can tolerate one and two processor failures. The examples given indicate that for large scale systems, the approach can produce improved designs, which are more in accord with the limitations of current technology.
引用
收藏
页码:357 / 363
页数:7
相关论文
共 50 条
  • [1] MULTICOMPUTERS - MESSAGE-PASSING CONCURRENT COMPUTERS
    ATHAS, WC
    SEITZ, CL
    COMPUTER, 1988, 21 (08) : 9 - 24
  • [2] Unified fault-tolerance framework for hybrid task-parallel message-passing applications
    Subasi, Omer
    Martsinkevich, Tatiana
    Zyulkyarov, Ferad
    Unsal, Osman
    Labarta, Jesus
    Cappello, Franck
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2018, 32 (05): : 641 - 657
  • [3] Incorporating Fault-Tolerance Awareness into System-Level Modeling and Simulation
    Johnson, Trokon
    Lam, Herman
    PROCEEDINGS OF WORKSHOP ON FAULT TOLERANCE FOR HPC AT EXTREME SCALE (FTXS 2021), 2021, : 31 - 40
  • [4] Incorporating Fault-Tolerance Awareness into System-Level Modeling and Simulation
    Johnson, Trokon
    Lam, Herman
    2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 829 - 830
  • [5] Optimizing communication for array operations on message-passing multicomputers
    Eberhart, A
    Li, JK
    SECOND INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS, AND NETWORKS (I-SPAN '96), PROCEEDINGS, 1996, : 242 - 248
  • [6] MAPPING NEURAL NETWORKS ONTO MESSAGE-PASSING MULTICOMPUTERS
    GHOSH, J
    HWANG, K
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1989, 6 (02) : 291 - 330
  • [7] Configurable spare processors: A new approach to system level fault-tolerance
    Kim, K
    Karri, R
    Potkonjak, M
    1996 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 1996, : 295 - 303
  • [8] A system-level approach to adaptivity and fault-tolerance in NoC-based MPSoCs: The MADNESS project
    Derin, Onur
    Cannella, Emanuele
    Tuveri, Giuseppe
    Meloni, Paolo
    Stefanov, Todor
    Fiorin, Leandro
    Raffo, Luigi
    Sami, Mariagiovanna
    MICROPROCESSORS AND MICROSYSTEMS, 2013, 37 (6-7) : 515 - 529
  • [9] Concurrent fault simulation on message passing multicomputers
    Lucent Technologies, Murray Hill, United States
    IEEE Trans Very Large Scale Integr VLSI Syst, 2 (332-342):
  • [10] Concurrent fault simulation on message passing multicomputers
    Bose, S
    Agrawal, P
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 1998, 6 (02) : 332 - 342