Iterative Selection of Categorical Variables for Log Data Anomaly Detection

被引:4
|
作者
Landauer, Max [1 ]
Hoeld, Georg [1 ]
Wurzenberger, Markus [1 ]
Skopik, Florian [1 ]
Rauber, Andreas [2 ]
机构
[1] Austrian Inst Technol, Giefinggasse 4, Vienna, Austria
[2] Vienna Univ Technol, Favoritenstr 9-11, Vienna, Austria
来源
COMPUTER SECURITY - ESORICS 2021, PT I | 2021年 / 12972卷
基金
欧盟地平线“2020”;
关键词
OUTLIERS;
D O I
10.1007/978-3-030-88418-5_36
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Log data is a well-known source for anomaly detection in cyber security. Accordingly, a large number of approaches based on self-learning algorithms have been proposed in the past. Most of these approaches focus on numeric features extracted from logs, since these variables are convenient to use with commonly known machine learning techniques. However, system log data frequently involves multiple categorical features that provide further insights into the state of a computer system and thus have the potential to improve detection accuracy. Unfortunately, it is non-trivial to derive useful correlation rules from the vast number of possible values of all available categorical variables. Therefore, we propose the Variable Correlation Detector (VCD) that employs a sequence of selection constraints to efficiently disclose pairs of variables with correlating values. The approach also comprises of an online mode that continuously updates the identified variable correlations to account for system evolution and applies statistical tests on conditional occurrence probabilities for anomaly detection. Our evaluations show that the VCD is well adjustable to fit properties of the data at hand and discloses associated variables with high accuracy. Our experiments with real log data indicate that the VCD is capable of detecting attacks such as scans and brute-force intrusions with higher accuracy than existing detectors.
引用
收藏
页码:757 / 777
页数:21
相关论文
共 50 条
  • [21] Big Log Data Stream Processing: Adapting an Anomaly Detection Technique
    Dietz, Marietheres
    Pernul, Guenther
    DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 159 - 166
  • [22] LogKT: Hybrid Log Anomaly Detection Method for Cloud Data Center
    Ou, Xuedong
    Liu, Jing
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 164 - 173
  • [23] Anomaly detection of policies in distributed firewalls using data log analysis
    Azam Andalib
    Seyed Morteza Babamir
    The Journal of Supercomputing, 2023, 79 : 19473 - 19514
  • [24] Hive-Based Anomaly Detection in Hadoop Log Data Management
    Son, Siwoon
    Gil, Myeong-Seon
    Yang, Seokwoo
    Moon, Yang-Sae
    ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 837 - 842
  • [25] Anomaly detection of policies in distributed firewalls using data log analysis
    Andalib, Azam
    Babamir, Seyed Morteza
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (17): : 19473 - 19514
  • [26] Automated anomaly detection for categorical data by repurposing a form filling recommender system
    Belgacem, Hichem
    Li, Xiaochen
    Bianculli, Domenico
    Briand, Lionel
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2024, 16 (03):
  • [27] New Filter method for categorical variables' selection
    Bouhamed, H., 1600, International Journal of Computer Science Issues (IJCSI) (09): : 3 - 2
  • [28] Feature Selection for Anomaly Detection in Call Center Data
    Iheme, Leonardo O.
    Ozan, Sukru
    2019 11TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO 2019), 2019, : 926 - 929
  • [29] On Proxy Variables and Categorical Data Fusion
    Zhang, Li-Chun
    JOURNAL OF OFFICIAL STATISTICS, 2015, 31 (04) : 783 - 807
  • [30] Unsupervised log message anomaly detection
    Farzad, Amir
    Gulliver, T. Aaron
    ICT EXPRESS, 2020, 6 (03): : 229 - 237