An effective fault-tolerant routing methodology for direct networks

被引:0
|
作者
Gómez, ME [1 ]
Flich, J [1 ]
López, P [1 ]
Robles, A [1 ]
Duato, J [1 ]
Nordbotten, NA [1 ]
Lysne, O [1 ]
Skeie, T [1 ]
机构
[1] Univ Politecn Valencia, Dept Comp Engn, E-46071 Valencia, Spain
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Current massively parallel computing systems are being built with thousands of nodes, which significantly affects the probability of failure. In [14], we proposed a methodology to design fault-tolerant routing algorithms for direct interconnection networks. The methodology uses a simple mechanism: for some source-destination pairs, packets are first forwarded to an intermediate node, and later, from this node to the destination node. Minimal adaptive routing is used along both subpaths. For those cases where the methodology cannot find a suitable intermediate node, it combines the use of intermediate nodes with two additional mechanisms: disabling adaptive routing and using misrouting on a per-packet basis. While the combination of these three mechanisms tolerates a large number of faults, each one requires adding some hardware support in the network and also introduces some overhead. In this paper, we will perform an in-depth detailed analysis of the impact of these mechanisms on network behaviour. We will analyze the impact of the three mechanisms separately and combined. The ultimate goal of this paper is to obtain a suitable combination of mechanisms that is able to meet the trade-off between fault-tolerance degree, routing complexity, and performance.
引用
收藏
页码:222 / 231
页数:10
相关论文
共 50 条
  • [1] A new adaptive fault-tolerant routing methodology for direct networks
    Gómez, ME
    Duato, J
    Flich, J
    López, P
    Robles, A
    Nordbotten, NA
    Skeie, T
    Lysne, O
    HIGH PERFORMANCE COMPUTING - HIPC 2004, 2004, 3296 : 462 - 473
  • [2] Fault-Tolerant Routing Methodology for Networks-on-Chip
    Savva, S.
    2017 27TH INTERNATIONAL SYMPOSIUM ON POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION (PATMOS), 2017,
  • [3] A memory-effective fault-tolerant routing strategy for direct interconnection networks
    Gómez, ME
    López, P
    Duato, J
    ISPDC 2005: 4th International Symposium on Parallel and Distributed Computing, 2005, : 341 - 348
  • [4] A FAMILY OF FAULT-TOLERANT ROUTING PROTOCOLS FOR DIRECT MULTIPROCESSOR NETWORKS
    GAUGHAN, PT
    YALAMANCHILI, S
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1995, 6 (05) : 482 - 497
  • [5] An efficient fault-tolerant routing methodology for fat-tree interconnection networks
    Gomez, Crispin
    Gomez, Maria E.
    Lopez, Pedro
    Duato, Jose
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2007, 4742 : 509 - 522
  • [6] FAULT-TOLERANT ROUTING IN MULTISTAGE INTERCONNECTION NETWORKS
    VARMA, A
    RAGHAVENDRA, CS
    IEEE TRANSACTIONS ON COMPUTERS, 1989, 38 (03) : 385 - 393
  • [7] Fault-tolerant routing algorithms for unidirectional networks
    Lam, CW
    Lau, FCM
    I-SPAN'02: INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2002, : 329 - 333
  • [9] Fault-tolerant wormhole routing for hypercube networks
    Shih, JD
    INFORMATION PROCESSING LETTERS, 2003, 86 (02) : 93 - 100
  • [10] Fault-tolerant message routing in computer networks
    Zakrevski, L
    Karpovsky, M
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, PROCEEDINGS, 1999, : 2279 - 2285