Efficient density and cluster based incremental outlier detection in data streams

被引:45
|
作者
Degirmenci, Ali [1 ]
Karal, Omer [1 ]
机构
[1] Ankara Yildirim Beyazit Univ, Ayvali Mah 150,Sok Etlik Kecioren, Ankara, Turkey
关键词
LOF; DBSCAN; Outlier detection; Core KNN; Incremental learning; Data stream;
D O I
10.1016/j.ins.2022.06.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a novel, parameter-free, incremental local density and cluster-based outlier factor (iLDCBOF) method is presented that unifies incremental versions of local outlier factor (LOF) and density-based spatial clustering of applications with noise (DBSCAN) to detect outliers efficiently in data streams. The iLDCBOF has many advanced advantages compared to previously reported iLOF-based studies: (1) it is based on a newly developed core k-nearest neighbor (CkNN) concept to reliably and scalably detect outliers from data streams and prevent the clustering of outliers; 2) it uses a newly-developed algorithm that automatically adjusts the value of the k (number of neighbors) parameter for different real-time applications; and 3) it uses the Mahalanobis distance metric, so its performance is not affected even for large amounts of data. The iLDCBOF method is well suited for different data stream applications because it requires no distribution assumptions, it is parameterless (determined automatically), and it is easy to implement. ROC-AUC and statistical test analysis results from extensive experiments performed on 16 different real world datasets showed that the iLDCBOF method significantly outperformed benchmark methods.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:901 / 920
页数:20
相关论文
共 50 条
  • [21] UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams
    Cai, Saihua
    Li, Li
    Li, Qian
    Li, Sicong
    Hao, Shangbo
    Sun, Ruizhi
    APPLIED INTELLIGENCE, 2020, 50 (10) : 3452 - 3470
  • [22] An efficient algorithm for distributed density-based outlier detection on big data
    Bai, Mei
    Wang, Xite
    Xin, Junchang
    Wang, Guoren
    NEUROCOMPUTING, 2016, 181 : 19 - 28
  • [23] WMFP-Outlier: An Efficient Maximal Frequent-Pattern-Based Outlier Detection Approach for Weighted Data Streams
    Cai, Saihua
    Li, Qian
    Li, Sicong
    Yuan, Gang
    Sun, Ruizhi
    INFORMATION TECHNOLOGY AND CONTROL, 2019, 48 (04): : 505 - 521
  • [24] Online Outlier Detection for Data Streams
    Sadik, Shiblee
    Gruenwald, Le
    PROCEEDINGS OF THE 15TH INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM (IDEAS '11), 2011, : 88 - 96
  • [25] Outlier Detection on Uncertain Data Streams
    Zhu B.
    Zhong Y.
    Wang X.
    Bai M.
    Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2020, 47 (02): : 134 - 140
  • [26] An efficient approach for outlier detection from uncertain data streams based on maximal frequent patterns
    Cai, Saihua
    Li, Li
    Li, Sicong
    Sun, Ruizhi
    Yuan, Gang
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 160
  • [27] An Outlier Detection Algorithm for Data Streams Based on Fuzzy Clustering
    Su, Xiaoke
    Qin, Yuming
    Wan, Renxia
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, 2008, : 109 - 112
  • [28] Explainable Distance-Based Outlier Detection in Data Streams
    Toliopoulos, Theodoros
    Gounaris, Anastasios
    IEEE ACCESS, 2022, 10 : 47921 - 47936
  • [29] Outlier Detection Method of Environmental Streams Based on Kernel Density Estimation
    Wu, Pengfei
    Li, Guanghui
    Zhu, Hong
    Lu, Wenwei
    ADVANCES IN WIRELESS SENSOR NETWORKS, 2015, 501 : 467 - 480
  • [30] Cluster Based Outlier Detection Algorithm For Healthcare Data
    Christy, A.
    MeeraGandhi, G.
    Vaithyasubramanian, S.
    BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 209 - 215