Iterative Selection of Categorical Variables for Log Data Anomaly Detection

被引:4
|
作者
Landauer, Max [1 ]
Hoeld, Georg [1 ]
Wurzenberger, Markus [1 ]
Skopik, Florian [1 ]
Rauber, Andreas [2 ]
机构
[1] Austrian Inst Technol, Giefinggasse 4, Vienna, Austria
[2] Vienna Univ Technol, Favoritenstr 9-11, Vienna, Austria
来源
基金
欧盟地平线“2020”;
关键词
OUTLIERS;
D O I
10.1007/978-3-030-88418-5_36
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Log data is a well-known source for anomaly detection in cyber security. Accordingly, a large number of approaches based on self-learning algorithms have been proposed in the past. Most of these approaches focus on numeric features extracted from logs, since these variables are convenient to use with commonly known machine learning techniques. However, system log data frequently involves multiple categorical features that provide further insights into the state of a computer system and thus have the potential to improve detection accuracy. Unfortunately, it is non-trivial to derive useful correlation rules from the vast number of possible values of all available categorical variables. Therefore, we propose the Variable Correlation Detector (VCD) that employs a sequence of selection constraints to efficiently disclose pairs of variables with correlating values. The approach also comprises of an online mode that continuously updates the identified variable correlations to account for system evolution and applies statistical tests on conditional occurrence probabilities for anomaly detection. Our evaluations show that the VCD is well adjustable to fit properties of the data at hand and discloses associated variables with high accuracy. Our experiments with real log data indicate that the VCD is capable of detecting attacks such as scans and brute-force intrusions with higher accuracy than existing detectors.
引用
收藏
页码:757 / 777
页数:21
相关论文
共 50 条
  • [1] Analysis of statistical properties of variables in log data for advanced anomaly detection in cyber security
    Wurzenberger, Markus
    Hoeld, Georg
    Landauer, Max
    Skopik, Florian
    COMPUTERS & SECURITY, 2024, 137
  • [2] Anomaly Detection Methods for Categorical Data: A Review
    Taha, Ayman
    Hadi, Ali S.
    ACM COMPUTING SURVEYS, 2019, 52 (02)
  • [3] Variable Selection for Correlated High-Dimensional Data with Infrequent Categorical Variables: Based on Sparse Sample Regression and Anomaly Detection Technology
    Kotsuka, Yuhei
    Arima, Sumika
    INTELLIGENT DECISION TECHNOLOGIES, KES-IDT 2021, 2021, 238 : 109 - 125
  • [4] Latent Gaussian process for anomaly detection in categorical data
    Lv, Fengmao
    Liang, Tao
    Zhao, Jiayi
    Zhuo, Zhongliu
    Wu, Jinzhao
    Yang, Guowu
    KNOWLEDGE-BASED SYSTEMS, 2021, 220
  • [5] Robust Log-Based Anomaly Detection on Unstable Log Data
    Zhang, Xu
    Xu, Yong
    Lin, Qingwei
    Qiao, Bo
    Zhang, Hongyu
    Dang, Yingnong
    Xie, Chunyu
    Yang, Xinsheng
    Cheng, Qian
    Li, Ze
    Chen, Junjie
    He, Xiaoting
    Yao, Randolph
    Lou, Jian-Guang
    Chintalapati, Murali
    Shen, Furao
    Zhang, Dongmei
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 807 - 817
  • [6] Unsaturated Log-Linear Model Selection for Categorical Data Analysis
    Ghosh, Subir
    Chowdhury, Arnab
    STATISTICS AND APPLICATIONS, 2021, 19 (01): : 417 - 429
  • [7] Anomaly Detection and Root Cause Analysis on Log Data
    Pasha, Daem
    Shah, Ali Hussain
    Zadeh, Esmaeil Habib
    Konur, Savas
    ARTIFICIAL INTELLIGENCE XXXIX, AI 2022, 2022, 13652 : 333 - 339
  • [8] InterpretableSAD: Interpretable Anomaly Detection in Sequential Log Data
    Han, Xiao
    Cheng, He
    Xu, Depeng
    Yuan, Shuhan
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 1183 - 1192
  • [9] Deep learning for anomaly detection in log data: A survey
    Landauer, Max
    Onder, Sebastian
    Skopik, Florian
    Wurzenberger, Markus
    MACHINE LEARNING WITH APPLICATIONS, 2023, 12
  • [10] Deep learning for anomaly detection in log data: A survey
    Landauer, Max
    Onder, Sebastian
    Skopik, Florian
    Wurzenberger, Markus
    MACHINE LEARNING WITH APPLICATIONS, 2023, 12