On the effectiveness of log representation for log-based anomaly detection

被引:5
|
作者
Wu, Xingfang [1 ]
Li, Heng [1 ]
Khomh, Foutse [1 ]
机构
[1] Polytech Montreal, Dept Comp Engn & Software Engn, Montreal, PQ, Canada
关键词
Log representation; Anomaly detection; Automated log analysis;
D O I
10.1007/s10664-023-10364-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Logs are an essential source of information for people to understand the running status of a software system. Due to the evolving modern software architecture and maintenance methods, more research efforts have been devoted to automated log analysis. In particular, machine learning (ML) has been widely used in log analysis tasks. In ML-based log analysis tasks, converting textual log data into numerical feature vectors is a critical and indispensable step. However, the impact of using different log representation techniques on the performance of the downstream models is not clear, which limits researchers and practitioners' opportunities of choosing the optimal log representation techniques in their automated log analysis workflows. Therefore, this work investigates and compares the commonly adopted log representation techniques from previous log analysis research. Particularly, we select six log representation techniques and evaluate them with seven ML models and four public log datasets (i.e., HDFS, BGL, Spirit and Thunderbird) in the context of log-based anomaly detection.We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques. From the experiments, we provide some heuristic guidelines for future researchers and developers to follow when designing an automated log analysis workflow. We believe our comprehensive comparison of log representation techniques can help researchers and practitioners better understand the characteristics of different log representation techniques and provide them with guidance for selecting the most suitable ones for their ML-based log analysis workflow.
引用
收藏
页数:39
相关论文
共 50 条
  • [11] An unsupervised heterogeneous log-based framework for anomaly detection
    Hajamydeen, Asif Iqbal
    Udzir, Nur Izura
    Mahmod, Ramlan
    Abdul Ghani, Abdul Azim
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (03) : 1117 - 1134
  • [12] Literature Survey on Log-Based Anomaly Detection Framework in Cloud
    Meenakshi
    Ramachandra, A. C.
    Bhattacharya, Subhrajit
    COMPUTATIONAL INTELLIGENCE IN PATTERN RECOGNITION, CIPR 2020, 2020, 1120 : 143 - 153
  • [13] LogDP: Combining Dependency and Proximity for Log-Based Anomaly Detection
    Xie, Yongzheng
    Zhang, Hongyu
    Zhang, Bo
    Babar, Muhammad Ali
    Lu, Sha
    SERVICE-ORIENTED COMPUTING (ICSOC 2021), 2021, 13121 : 708 - 716
  • [14] Log-based Anomaly Detection of CPS Using a Statistical Method
    Harada, Yoshiyuki
    Yamagata, Yoriyuki
    Mizuno, Osamu
    Choi, Eun-Hye
    2017 8TH IEEE INTERNATIONAL WORKSHOP ON EMPIRICAL SOFTWARE ENGINEERING IN PRACTICE (IWESEP), 2017, : 1 - 6
  • [15] A log-based anomaly detection method with the NW ensemble rules
    Wang, Bingming
    Ying, Shi
    Cheng, Guoli
    Li, Yiyao
    2020 IEEE 20TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY (QRS 2020), 2020, : 72 - 82
  • [16] LogPrompt: A Log-based Anomaly Detection Framework Using Prompts
    Zhang, Ting
    Huang, Xin
    Zhao, Wen
    Bian, Shaohuang
    Du, Peng
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [17] ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection
    Egersdoerfer, Chris
    Zhang, Di
    Dai, Dong
    2022 IEEE/ACM 12TH WORKSHOP ON FAULT TOLERANCE FOR HPC AT EXTREME SCALE (FTXS), 2022, : 1 - 10
  • [18] Black-box Attacks to Log-based Anomaly Detection
    Huang, Shaohan
    Liu, Yi
    Fung, Carol
    Yang, Hailong
    Luan, Zhongzhi
    2022 18TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM 2022): INTELLIGENT MANAGEMENT OF DISRUPTIVE NETWORK TECHNOLOGIES AND SERVICES, 2022, : 310 - 316
  • [19] A robust Wide & Deep learning framework for log-based anomaly detection
    Niu, Weina
    Liao, Xuhan
    Huang, Shiping
    Li, Yudong
    Zhang, Xiaosong
    Li, Beibei
    APPLIED SOFT COMPUTING, 2024, 153
  • [20] Log-Based Anomaly Detection with the Improved K-Nearest Neighbor
    Wang, Bingming
    Ying, Shi
    Cheng, Guoli
    Wang, Rui
    Yang, Zhe
    Dong, Bo
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2020, 30 (02) : 239 - 262