ViSRE: A Unified Visual Analysis Dashboard for Proactive Cloud Outage Management

被引:2
|
作者
Kayongo, Paula [1 ]
Hoffswell, Jane [2 ]
Saini, Shiv [3 ]
Garg, Shaddy [3 ]
Koh, Eunyee [4 ]
Wang, Haoliang [4 ]
Jacobs, Tom [4 ]
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
[2] Adobe Res, Washington, DC USA
[3] Adobe Res, Bangalore, Karnataka, India
[4] Adobe Res, Santa Clara, CA USA
关键词
Cloud Outage Prediction; Root Cause Analysis; Software Visualization;
D O I
10.1109/VISSOFT55257.2022.00010
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Efficient outage detection and remediation is crucial for effectively operating cloud computing systems. To remediate outages, system engineers must quickly identify the causal relationships between metrics and correlate events across multiple monitoring tools. In practice, this process largely remains reactive due to the complexity and general lack of interpretability within such monitoring environments. This work presents ViSRE: an integrated visual analytics system that integrates causal and predictive models with interactive visualizations to aid in proactive cloud outage management. We develop enhanced node representations for our causal graph representation to support system engineers in performing root cause analysis and reasoning about causality chains in multi-dimensional temporal data. We report the results of a quantitative assessment of the proposed predictive models, which show good performance guarantees. To evaluate and refine our system, we conduct a study with six cloud system engineers who verify that our proposed techniques can support proactive cloud maintenance by intuitively displaying temporal relationships between predicted and raw data. By correlating and presenting data from disparate sources, ViSRE also reduces context switching costs and reduces the time spent on manually correlating events during remediation of time-critical outages.
引用
收藏
页码:5 / 16
页数:12
相关论文
共 50 条
  • [31] VAAD: Visual Attention Analysis Dashboard applied to e-Learning
    Navarro, Miriam
    Becerra, Alvaro
    Daza, Roberto
    Cobos, Ruth
    Morales, Aythami
    Fierrez, Julian
    XXVI INTERNATIONAL SYMPOSIUM ON COMPUTERS IN EDUCATION, SIIE 2024, 2024,
  • [32] URL: A unified reinforcement learning approach for autonomic cloud management
    Xu, Cheng-Zhong
    Rao, Jia
    Bu, Xiangping
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2012, 72 (02) : 95 - 105
  • [33] Research on Unified Resource Management and Scheduling System in Cloud Environment
    Jiang, Hua
    Xiao, Yanli
    WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (02) : 963 - 973
  • [34] Research on Unified Resource Management and Scheduling System in Cloud Environment
    Hua Jiang
    Yanli Xiao
    Wireless Personal Communications, 2018, 102 : 963 - 973
  • [35] Security of Visual Codes in Service Management in the Cloud
    Ogiela, Lidia
    Ogiela, Marek R.
    2017 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2017, : 165 - 168
  • [36] Exact Outage Probability Analysis of Proactive Relay Selection in Cognitive Radio Networks with MRC Receivers
    Ho-Van, Khuong
    JOURNAL OF COMMUNICATIONS AND NETWORKS, 2016, 18 (03) : 288 - 298
  • [37] Proactive Thermal-Aware Resource Management in Virtualized HPC Cloud Datacenters
    Lee, Eun Kyung
    Viswanathan, Hariharasudhan
    Pompili, Dario
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2017, 5 (02) : 234 - 248
  • [38] IMPROVING CELL OUTAGE MANAGEMENT THROUGH DATA ANALYSIS
    de la Bandera, Isabel
    Munoz, Pablo
    Serrano, Inmaculada
    Barco, Raquel
    IEEE WIRELESS COMMUNICATIONS, 2017, 24 (04) : 113 - 119
  • [39] Simple and accurate methods for outage analysis in cellular mobile radio systems - A unified approach
    Annamalai, A
    Tellambura, C
    Bhargava, VK
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2001, 49 (02) : 303 - 316
  • [40] UniDRM: Unified Data and Resource Management for Federated Vehicular Cloud Computing
    Danquah, Wiseborn M.
    Altilar, D. Turgay
    IEEE ACCESS, 2021, 9 : 157052 - 157067