A Cluster-Based Implementation of a Fault Tolerant Parallel Reduction Algorithm Using Swarm-Array Computing

被引:2
|
作者
Varghese, Blesson [1 ]
McKee, Gerard [1 ]
Alexandrov, Vassil [1 ]
机构
[1] Univ Reading, Sch Syst Engn, Reading RG6 6AY, Berks, England
关键词
swarm-array computing; intelligent agents; fault-tolerant system; cluster-based implementation;
D O I
10.1109/ICAS.2010.13
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.
引用
收藏
页码:30 / 36
页数:7
相关论文
共 50 条
  • [21] Evaluating reliability improvements of fault tolerant array processors using algorithm-based fault tolerance
    Tao, DL
    Kantawala, K
    IEEE TRANSACTIONS ON COMPUTERS, 1997, 46 (06) : 725 - 730
  • [22] FAULT-TOLERANT PROGRAMMING FOR NETWORK-BASED PARALLEL COMPUTING
    CLEMATIS, A
    MICROPROCESSING AND MICROPROGRAMMING, 1994, 40 (10-12): : 765 - 768
  • [23] Cluster-based fault diagnosis algorithm in ad-hoc networks
    Li, Dong-Ni
    Wang, Guang-Xing
    2003, Northeastern University (24):
  • [24] Cluster-based Algorithm of Reconnaissance UAV Swarm Based on Wireless Ultraviolet Secret Communication
    Zhao Taifei
    Xu Shan
    Qu Yao
    Wang Jing
    Zhang Jie
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2019, 41 (04) : 967 - 972
  • [25] A Cluster-based Delay Tolerant Routing Algorithm for Vehicular Ad Hoc Networks
    Zheng, Jun
    Tong, Hui
    Wu, Yuying
    2017 IEEE 85TH VEHICULAR TECHNOLOGY CONFERENCE (VTC SPRING), 2017,
  • [26] FT-FW: A cluster-based fault-tolerant architecture for stateful firewalls
    Neira Ayuso, Pablo
    Gasca, Rafael M.
    Lefevre, Laurent
    COMPUTERS & SECURITY, 2012, 31 (04) : 524 - 539
  • [27] Particle swarm optimization algorithm for energy-efficient cluster-based sensor networks
    Shih, Tzay-Farn
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2006, E89A (07) : 1950 - 1958
  • [28] Key technologies research on building a cluster-based parallel computing system for remote sensing
    Li, GQ
    Liu, DS
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 3, 2005, 3516 : 484 - 491
  • [29] Research on homogeneous cluster-based hierarchical nested string matching parallel algorithm
    Li, Lei
    Gu, Yuwan
    Guo, Yanyan
    He, Kemeng
    Chen, Yan
    Sun, Yuqiang
    Information Technology Journal, 2013, 12 (14) : 2857 - 2862
  • [30] Improving load balance and fault tolerance for PC cluster-based parallel information retrieval
    Kang, JH
    Ahn, H
    Jung, SW
    Ryu, KR
    Kwon, HC
    Chung, SH
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2004, 3019 : 682 - 687