Dynamic Failure Detection in Distributed Environment

被引:0
|
作者
Noor, Ahnnad Shukri Mohd [1 ]
Deris, Mustafa Mat [2 ]
机构
[1] Univ Malaysia Terengganu, Dept Comp Sci, Fac Sci & Technol, Terengganu 21030, Malaysia
[2] Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Johor Baharu 86400, Malaysia
关键词
Fault-Tolerance; Distributed Systems;
D O I
10.1166/asl.2014.5309
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Failure monitoring and detection phase is a critical part in providing a scalable, reliable and high availability in current distributed environment. Heartbeat style of interaction method is a popular technique. This technique is utilised for detecting a fault where it monitors the system resources continuously in a very short interval. However, this approach has its limitations as it requires a period of times to detect the faulty node, therefore delaying the recovery procedures to be taken. This paper presents a fault detection mechanism and service using hybrid heartbeat mechanisms and dynamic maximum time allocation interval for each heartbeat message. This technique introduced the use of index server for indexing the transaction and utilizing dynamic hybrid heartbeat mechanism and pinging procedure for fault detection. The evaluation outcome indicates the use of the hybrid heartbeat mechanism allows us reducing approximately 30% time taken to detect fault compare to an existing techniques and provides a basis for customizable recovery actions to be deployed.
引用
收藏
页码:21 / 25
页数:5
相关论文
共 50 条
  • [41] Detection and Advancement Monitoring of Distributed Pitting Failure in Gears
    Hasan Ozturk
    Isa Yesilyurt
    Mustafa Sabuncu
    Journal of Nondestructive Evaluation, 2010, 29 : 63 - 73
  • [42] Designing a service of failure detection in asynchronous distributed systems
    Baldoni, R
    Zito, F
    FOURTH IEEE INTERNATIONAL SYMPOSIUM ON OBJECT-ORIENTED REAL-TIME DISTRIBUTED COMPUTING, PROCEEDINGS, 2001, : 113 - 120
  • [43] Principled monitoring of distributed agents for detection of coordination failure
    Browning, B
    Kaminka, GA
    Veloso, MM
    DISTRIBUTED AUTONOMOUS ROBOTIC SYSTEMS 5, 2002, : 319 - 328
  • [44] A Failure Detection System for Large Scale Distributed Systems
    Lavinia, Andrei
    Dobre, Ciprian
    Pop, Florin
    Cristea, Valentin
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2011, 2 (03) : 64 - 87
  • [45] Adaptive and Distributed Algorithms for Vehicle Routing in a Stochastic and Dynamic Environment
    Pavone, Marco
    Frazzoli, Emilio
    Bullo, Francesco
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2011, 56 (06) : 1259 - 1274
  • [46] MULTIAGENT COORDINATION AND COOPERATION IN A DISTRIBUTED DYNAMIC ENVIRONMENT WITH LIMITED RESOURCES
    FINDLER, NV
    ELDER, GD
    ARTIFICIAL INTELLIGENCE IN ENGINEERING, 1995, 9 (03): : 229 - 238
  • [47] Dynamic scheduling of production-assembly networks in a distributed environment
    Masin, Michael
    Pasaogullari, Melike Oz
    Joshi, Sanjay
    IIE TRANSACTIONS, 2007, 39 (04) : 395 - 409
  • [48] Automatic adaptation of streaming multimedia content in a dynamic and distributed environment
    Hutter, A
    Amon, P
    Panis, G
    Delfosse, E
    Ransburg, M
    Hellwagner, H
    2005 International Conference on Image Processing (ICIP), Vols 1-5, 2005, : 3865 - 3868
  • [49] Dynamic, competitive scheduling of multiple DAGs in a distributed heterogeneous environment
    Iverson, M
    Ozguner, F
    SEVENTH HETEROGENEOUS COMPUTING WORKSHOP (HCW '98), 1998, : 70 - 78
  • [50] Research on dynamic trust model for large scale distributed environment
    Li, Xiao-Yong
    Gui, Xiao-Lin
    Ruan Jian Xue Bao/Journal of Software, 2007, 18 (06): : 1510 - 1521