Root Cause Analysis of Failures in Microservices through Causal Discovery

被引:0
|
作者
Ikram, Azam [1 ]
Chakraborty, Sarthak [2 ]
Mitra, Subrata [2 ]
Saini, Shiv Kumar [2 ]
Bagchi, Saurabh [1 ]
Kocaoglu, Murat [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Adobe Res, Mountain View, CA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most cloud applications use a large number of smaller sub-components (called microservices) that interact with each other in the form of a complex graph to provide the overall functionality to the user. While the modularity of the microservice architecture is beneficial for rapid software development, maintaining and debugging such a system quickly in cases of failure is challenging. We propose a scalable algorithm for rapidly detecting the root cause of failures in complex microservice architectures. The key ideas behind our novel hierarchical and localized learning approach are: (1) to treat the failure as an intervention on the root cause to quickly detect it, (2) only learn the portion of the causal graph related to the root cause, thus avoiding a large number of costly conditional independence tests, and (3) hierarchically explore the graph. The proposed technique is highly scalable and produces useful insights about the root cause, while the use of traditional techniques becomes infeasible due to high computation time. Our solution is application agnostic and relies only on the data collected for diagnosis. For the evaluation, we compare the proposed solution with a modified version of the PC algorithm and the state-of-the-art for root cause analysis. The results show a considerable improvement in top-k recall while significantly reducing the execution time.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Chain-of-Event: Interpretable Root Cause Analysis for Microservices through Automatically Learning Weighted Event Causal Graph
    Yao, Zhenhe
    Pei, Changhua
    Chen, Wenxiao
    Wang, Hanzhang
    Su, Liangfei
    Jiang, Huai
    Xie, Zhe
    Nie, Xiaohui
    Pei, Dan
    COMPANION PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, FSE COMPANION 2024, 2024, : 50 - 61
  • [2] Failure Root Cause Analysis for Microservices, Explained
    Soldani, Jacopo
    Forti, Stefano
    Brogi, Antonio
    DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS (DAIS 2022), 2022, 13272 : 74 - 91
  • [3] Root Cause Analysis in Microservice Using Neural Granger Causal Discovery
    Lin, Cheng-Ming
    Chang, Ching
    Wang, Wei-Yao
    Wang, Kuang-Da
    Peng, Wen-Chih
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 206 - 213
  • [4] Leveraging on causal knowledge for enhancing the root cause analysis of equipment spot inspection failures
    Zhou, Bin
    Li, Jie
    Li, Xinyu
    Hua, Bao
    Bao, Jinsong
    ADVANCED ENGINEERING INFORMATICS, 2022, 54
  • [5] Root cause analysis in engineering failures
    Bhaumik, S. K.
    TRANSACTIONS OF THE INDIAN INSTITUTE OF METALS, 2010, 63 (2-3): : 297 - 299
  • [6] Root Cause Analysis of Generator Failures
    Hudon, C.
    Levesque, M.
    Nguyen, D-H
    Millet, C.
    Truchon, F.
    CONFERENCE RECORD OF THE 2012 IEEE INTERNATIONAL SYMPOSIUM ON ELECTRICAL INSULATION (ISEI), 2012, : 199 - 203
  • [7] Root cause analysis in engineering failures
    S. K. Bhaumik
    Transactions of the Indian Institute of Metals, 2010, 63 : 297 - 299
  • [8] Anomaly Detection and Root Cause Analysis of Microservices Energy Consumption
    Floroiu, Maximilian Stefan
    Russo, Stefano
    Giamattei, Luca
    Guerriero, Antonio
    Malavolta, Ivano
    Pietrantuono, Roberto
    2024 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2024, 2024, : 590 - 600
  • [9] ROOT CAUSE ANALYSIS OF MOTOR STATOR FAILURES
    Gaerke, Tyler R.
    Hernandez, David C.
    CONFERENCE RECORD OF 2013 ANNUAL IEEE PULP AND PAPER INDUSTRY TECHNICAL CONFERENCE (PPIC), 2013,
  • [10] An unsupervised root cause analysis method for satellite on-orbit anomalies based on causal discovery
    Chen, Siya
    Long, Xi
    Jin, Guang
    Zeng, Zefan
    ADVANCES IN SPACE RESEARCH, 2023, 72 (09) : 3842 - 3855