Model-based Fault Localization: Finding Behavioral Outliers in Large-scale Computing Systems

被引:2
|
作者
Maruyama, Naoya [1 ]
Matsuoka, Satoshi [1 ,2 ]
机构
[1] Tokyo Inst Technol, Global Sci Informat & Comp Ctr GSIC, Meguro Ku, Tokyo 1528550, Japan
[2] Res Org Informat & Syst, Natl Inst Informat, Chiyoda Ku, Tokyo 1018430, Japan
关键词
Distributed Systems; Fault Localization; PERFORMANCE;
D O I
10.1007/s00354-009-0088-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a model-based approach to fault localization that aims to help the human analyst narrow down the manual localization into a small fraction of the overall system. Our method consists of two parts: pre-failure model derivation and post-failure model-based anomaly detection. The first part collects function-call traces from all processes and derives an execution model that reflects the function-calling behaviors of the target system. When a failure occurs, we identify the most deviant behaviors in the failed run by comparing the failure traces with the derived model. We claim that the analyst can substantially reduce the burden of fault localization by prioritizing such behaviors. Our preliminary experiment with a distributed job manager supports this claim: Our method narrows down localization of a 70-second faulty run on a 78-node distributed platform into just sub-second behaviors involving only two nodes.
引用
收藏
页码:237 / 255
页数:19
相关论文
共 50 条
  • [1] Model-based Fault Localization: Finding Behavioral Outliers in Large-scale Computing Systems
    Naoya Maruyama
    Satoshi Matsuoka
    New Generation Computing, 2010, 28 : 237 - 255
  • [2] Model-based fault localization in large-scale computing systems
    Maruyama, Naoya
    Matsuoka, Satoshi
    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 1841 - 1852
  • [3] Finding Outliers in Gaussian Model-based Clustering
    Clark, Katharine M.
    Mcnicholas, Paul D.
    JOURNAL OF CLASSIFICATION, 2024, 41 (02) : 313 - 337
  • [4] Model-based engineering of large-scale real-time systems
    Bapty, TA
    Sztipanovits, J
    INTERNATIONAL CONFERENCE AND WORKSHOP ON ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 1997, : 467 - 474
  • [5] DECENTRALIZED MODEL-BASED CONTROL OF A CLASS OF LARGE-SCALE INTERCONNECTED SYSTEMS
    PANDIAN, SR
    HANMANDLU, M
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 1993, 24 (03) : 499 - 514
  • [6] Model-based testing of global properties on large-scale distributed systems
    Sunye, Gerson
    de Almeida, Eduardo Cunha
    Le Traon, Yves
    Baudry, Benoit
    Jezequel, Jean-Marc
    INFORMATION AND SOFTWARE TECHNOLOGY, 2014, 56 (07) : 749 - 762
  • [7] Robust Large-Scale Collaborative Localization Based on Semantic Submaps With Extreme Outliers
    Tang, Yujie
    Wang, Meiling
    Yang, Yi
    Lan, Ziquan
    Yue, Yufeng
    IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2024, 29 (04) : 2649 - 2660
  • [8] dFault: Fault Localization in Large-Scale Peer-to-Peer Systems
    Prakash, Pawan
    Kompella, Ramana Rao
    Ramasubramanian, Venugopalan
    Chandra, Ranveer
    MIDDLEWARE 2010, 2010, 6452 : 252 - +
  • [9] Delta-oriented model-based integration testing of large-scale systems
    Lochau, Malte
    Lity, Sascha
    Lachmann, Remo
    Schaefer, Ina
    Goltz, Ursula
    JOURNAL OF SYSTEMS AND SOFTWARE, 2014, 91 : 63 - 84
  • [10] A model-based fatigue damage estimation framework of large-scale structural systems
    Giagopoulos, Dimitrios
    Arailopoulos, Alexandros
    Natsiavas, Sotirios
    STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2021, 20 (03): : 834 - 847