Automated Traces-based Anomaly Detection and Root Cause Analysis in Cloud Platforms

被引:1
|
作者
Soualhia, Mbarka [1 ]
Wuhib, Fetahi [1 ]
机构
[1] Ericsson Res Montreal, Montreal, PQ, Canada
关键词
Kernel Traces; Root Cause Analysis; Anomaly Detection; Fault Detection; Cloud;
D O I
10.1109/IC2E55432.2022.00034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current cloud infrastructures and their applications are increasingly complex, with confounding relationships among application elements and cloud infrastructure components. This makes timely identification of the root causes for faults that occur in such systems an important-yet-challenging task. In this paper, we propose a solution that automatically builds a correlation model and an anomaly detection model using kernel traces of cloud servers. The correlation model is used to capture the dependencies between the various elements of the cloud system while the anomaly detection model is used to identify anomalies related to specific elements of the system. Upon detection of a fault, our framework computes a dependency graph of detected anomalies using the models, which in turn is used to perform the root cause analysis. Evaluation results of our proposed framework on a Kubernetes cloud show that it can effectively find root causes of injected faults with an accuracy rate between 80% and 99.3%, with a low false negative rate.
引用
收藏
页码:253 / 260
页数:8
相关论文
共 50 条
  • [1] Automated Anomaly Detection and Root Cause Analysis in Virtualized Cloud Infrastructures
    Lin, Jieyu
    Zhang, Qi
    Bannazadeh, Hadi
    Leon-Garcia, Alberto
    NOMS 2016 - 2016 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2016, : 550 - 556
  • [2] Anomaly Detection and Failure Root Cause Analysis in (Micro) Service-Based Cloud Applications: A Survey
    Soldani, Jacopo
    Brogi, Antonio
    ACM COMPUTING SURVEYS, 2023, 55 (03)
  • [3] Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis
    Wittkopp, Thorsten
    Acker, Alexander
    Kao, Odej
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1231 - 1239
  • [4] Automated slow-start detection for anomaly root cause analysis and BBR identification
    Tlaiss, Ziad
    Ferrieux, Alexandre
    Amigo, Isabel
    Hamchaoui, Isabelle
    Vaton, Sandrine
    ANNALS OF TELECOMMUNICATIONS, 2024, 79 (3-4) : 149 - 163
  • [5] Automated slow-start detection for anomaly root cause analysis and BBR identification
    Ziad Tlaiss
    Alexandre Ferrieux
    Isabel Amigo
    Isabelle Hamchaoui
    Sandrine Vaton
    Annals of Telecommunications, 2024, 79 : 149 - 163
  • [6] CloudRCA: A Root Cause Analysis Framework for Cloud Computing Platforms
    Zhang, Yingying
    Guan, Zhengxiong
    Qian, Huajie
    Xu, Leili
    Liu, Hengbo
    Wen, Qingsong
    Sun, Liang
    Jiang, Junwei
    Fan, Lunting
    Ke, Min
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4373 - 4382
  • [7] Anomaly Detection and Root Cause Analysis on Log Data
    Pasha, Daem
    Shah, Ali Hussain
    Zadeh, Esmaeil Habib
    Konur, Savas
    ARTIFICIAL INTELLIGENCE XXXIX, AI 2022, 2022, 13652 : 333 - 339
  • [8] On Anomaly Detection and Root Cause Analysis of Microservice Systems
    Guan, Zijie
    Lin, Jinjin
    Chen, Pengfei
    SERVICE-ORIENTED COMPUTING, ICSOC 2018, 2019, 11434 : 465 - 469
  • [9] Anomaly Detection with Root Cause Analysis for Bottling Process
    Bator, Martyna
    Dicks, Alexander
    Deppe, Sahar
    Lohweg, Volker
    2019 24TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2019, : 1619 - 1622
  • [10] Unsupervised Anomaly Detection and Root Cause Analysis in Mobile Networks
    Kim, Cheolmin
    Mendiratta, Veena B.
    Thottan, Marina
    2020 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2020,