A Probabilistic Transformation of Distance-Based Outliers

被引:7
|
作者
Muhr, David [1 ,2 ]
Affenzeller, Michael [2 ]
Kueng, Josef [3 ]
机构
[1] BMW Grp, A-4400 Steyr, Austria
[2] Johannes Kepler Univ Linz, Inst Formal Models & Verificat, A-4040 Linz, Austria
[3] Johannes Kepler Univ Linz, Inst Applicat Oriented Knowledge Proc, A-4040 Linz, Austria
来源
关键词
anomaly detection; outlier detection; novelty detection; outlier scores; anomaly scores; score normalization; score distribution; score contrast; distance distribution; outlier probabilities; NOVELTY DETECTION; NEIGHBOR; ALGORITHMS;
D O I
10.3390/make5030042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Over a variety of tabular and image benchmark datasets, we show that the probabilistic transformation does not impact outlier ranking (ROC AUC) or detection performance (AP, F1), and increases the contrast between normal and outlier score distributions (statistical distance). The experimental findings indicate that it is possible to transform distance-based outlier scores into interpretable probabilities with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and, because existing distance computations are used, it adds no significant computational overhead.
引用
收藏
页码:782 / 802
页数:21
相关论文
共 50 条
  • [21] Efficient and flexible algorithms for monitoring distance-based outliers over data streams
    Kontaki, Maria
    Gounaris, Anastasios
    Papadopoulos, Apostolos N.
    Tsichlas, Kostas
    Manolopoulos, Yannis
    INFORMATION SYSTEMS, 2016, 55 : 37 - 53
  • [22] FINDING DISTANCE-BASED OUTLIERS IN SUBSPACES THROUGH BOTH POSITIVE AND NEGATIVE EXAMPLES
    Fassetti, Fabio
    Angiulli, Fabrizio
    ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE, 2010, : 5 - 10
  • [23] Rapid Parallel Detection of Distance-based Outliers in Time Series using MapReduce
    Ciolofan, Sorin N.
    Pop, Florin
    Mocanu, Mariana
    Cristea, Valentin
    CONTROL ENGINEERING AND APPLIED INFORMATICS, 2016, 18 (03): : 63 - 71
  • [24] Detecting outliers and influential points: an indirect classical Mahalanobis distance-based method
    Liu, Xuqing
    Gao, Feng
    Wu, Yandong
    Zhao, Zhiguo
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (11) : 2013 - 2033
  • [25] Distance-based Outliers Method for Detecting Disease Outbreaks using Social Media
    Dai, Xiangfeng
    Bikdash, Marwan
    SOUTHEASTCON 2016, 2016,
  • [26] DOLPHIN: An Efficient Algorithm for Mining Distance-Based Outliers in Very Large Datasets
    Angiulli, Fabrizio
    Fassetti, Fabio
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (01)
  • [27] Distance-based Probabilistic Data Augmentation for Synthetic Minority Oversampling
    Goodman, Joel
    Sarkani, Sharham
    Mazzuchi, Thomas
    ACM/IMS Transactions on Data Science, 2021, 2 (04):
  • [28] Validating distance-based record linkage with probabilistic record linkage
    Domingo-Ferrer, J
    Torra, V
    TOPICS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, 2504 : 207 - 215
  • [29] A distance-based probabilistic routing for underwater acoustic sensor networks
    Zhang, S. (zhangsongwhu@163.com), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (06):
  • [30] A probabilistic distance-based stability quantifier for complex dynamical systems
    Alvares, Calvin
    Banerjee, Soumitro
    NONLINEAR DYNAMICS, 2024, 112 (24) : 21869 - 21880