Predicting failures in multi-tier distributed systems

被引:19
|
作者
Mariani, Leonardo [1 ]
Pezze, Mauro [2 ,3 ]
Riganelli, Oliviero [1 ]
Xin, Rui [3 ]
机构
[1] Univ Milano Bicocca, Milan, Italy
[2] Univ Milano Bicocca, Software Engn, Milan, Italy
[3] Univ Lugano, Univ Svizzera Italiana, Lugano, Switzerland
基金
欧盟地平线“2020”;
关键词
Failure prediction; Multi-tier distributed systems; Self-healing systems; Data analytics; Machine learning; Cloud computing;
D O I
10.1016/j.jss.2019.110464
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many applications are implemented as multi-tier software systems, and are executed on distributed infrastructures, like cloud infrastructures, to benefit from the cost reduction that derives from dynamically allocating resources on-demand. In these systems, failures are becoming the norm rather than the exception, and predicting their occurrence, as well as locating the responsible faults, are essential enablers of preventive and corrective actions that can mitigate the impact of failures, and significantly improve the dependability of the systems. Current failure prediction approaches suffer either from false positives or limited accuracy, and do not produce enough information to effectively locate the responsible faults. In this paper, we present PreMiSE, a lightweight and precise approach to predict failures and locate the corresponding faults in multi-tier distributed systems. PreMiSE blends anomaly-based and signature-based techniques to identify multi-tier failures that impact on performance indicators, with high precision and low false positive rate. The experimental results that we obtained on a Cloud-based IP Multimedia Subsystem indicate that PreMiSE can indeed predict and locate possible failure occurrences with high precision and low overhead. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Multi-tier communication abstractions for distributed multi-agent systems
    Thome, M
    INTERNATIONAL CONFERENCE ON INTEGRATION OF KNOWLEDGE INTENSIVE MULTI-AGENT SYSTEMS: KIMAS'03: MODELING, EXPLORATION, AND ENGINEERING, 2003, : 209 - 214
  • [2] Distributed Digital Twin Migration in Multi-Tier Computing Systems
    Chen, Zhixiong
    Yi, Wenqiang
    Nallanathan, Arumugam
    Chambers, Jonathon A.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (01) : 109 - 123
  • [3] Research on Multi-tier Distributed Systems Based on AOP and Web Services
    Zhang, Jingjun
    Meng, Fanxin
    Liu, Guangyuan
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL II, 2009, : 203 - 207
  • [4] IoT systems with multi-tier, distributed intelligence: From architecture to prototype
    GabAllah N.
    Farrag I.
    Khalil R.
    Sharara H.
    ElBatt T.
    Pervasive and Mobile Computing, 2023, 93
  • [5] Consistent Replication in Distributed Multi-Tier Architectures
    Repantis, Thomas
    Iyengar, Arun
    Kalogeraki, Vana
    Rouvellou, Isabelle
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2011, : 105 - 114
  • [6] A General Model for Virtual Machines Resources Allocation in Multi-tier Distributed Systems
    Campegiani, Paolo
    Lo Presti, Francesco
    ICAS: 2009 FIFTH INTERNATIONAL CONFERENCE ON AUTONOMIC AND AUTONOMOUS SYSTEMS, 2009, : 162 - 167
  • [7] Improving recoverability in multi-tier storage systems
    Aguilera, Marcos K.
    Keeton, Kimberly
    Merchant, Arif
    Muniswamy-Reddy, Kiran-Kumar
    Uysal, Mustafa
    37TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2007, : 677 - +
  • [8] Tier-Centric Resource Allocation in Multi-Tier Cloud Systems
    Khasnabish, Jyotiska Nath
    Mithani, Mohammad Firoj
    Rao, Shrisha
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2017, 5 (03) : 576 - 589
  • [9] Improving Resource Allocation in Multi-Tier Cloud Systems
    Mithani, Mohammad Firoj
    Rao, Shrisha
    2012 IEEE INTERNATIONAL SYSTEMS CONFERENCE (SYSCON), 2012, : 356 - 361
  • [10] Response Time Speedup of Multi-Tier Internet Systems
    Berber, Fatih
    Yahyapour, Ramin
    2017 IEEE 36TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2017,