A Probabilistic Transformation of Distance-Based Outliers

被引:7
|
作者
Muhr, David [1 ,2 ]
Affenzeller, Michael [2 ]
Kueng, Josef [3 ]
机构
[1] BMW Grp, A-4400 Steyr, Austria
[2] Johannes Kepler Univ Linz, Inst Formal Models & Verificat, A-4040 Linz, Austria
[3] Johannes Kepler Univ Linz, Inst Applicat Oriented Knowledge Proc, A-4040 Linz, Austria
来源
关键词
anomaly detection; outlier detection; novelty detection; outlier scores; anomaly scores; score normalization; score distribution; score contrast; distance distribution; outlier probabilities; NOVELTY DETECTION; NEIGHBOR; ALGORITHMS;
D O I
10.3390/make5030042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Over a variety of tabular and image benchmark datasets, we show that the probabilistic transformation does not impact outlier ranking (ROC AUC) or detection performance (AP, F1), and increases the contrast between normal and outlier score distributions (statistical distance). The experimental findings indicate that it is possible to transform distance-based outlier scores into interpretable probabilities with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and, because existing distance computations are used, it adds no significant computational overhead.
引用
收藏
页码:782 / 802
页数:21
相关论文
共 50 条
  • [31] A Probabilistic Encounter and Distance-based Routing Protocol for Opportunistic Networks
    Dhurandher, Sanjay K.
    Borah, Satya J.
    Woungang, Isaac
    Gupta, Sahil
    Kuchal, Pragya
    Takizawa, Makoto
    Barolli, Leonard
    ADVANCES ON BROAD-BAND WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS, 2017, 2 : 491 - 499
  • [32] DDTM: A Distance-Based Data Transformation Method for Time Series Classification
    Xu, Huarong
    Wang, Ke
    Sun, Wu
    Chen, Mei
    Li, Hui
    Zhao, Heng
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 94 - 111
  • [33] Digital Image Colorization Based on Probabilistic Distance Transformation
    Lagodzinski, Przemyslaw
    Smolka, Bogdan
    PROCEEDINGS ELMAR-2008, VOLS 1 AND 2, 2008, : 495 - +
  • [34] Data transformation techniques for preserving privacy in distance-based mining algorithms
    Kadampur, Mohammad Ali
    Somayajulu, D. V. L. N.
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2014, 6 (03) : 285 - 311
  • [35] Top (k1, k2) Distance-based Outliers Detection in an Uncertain Dataset
    Liu, Fei
    Jia, Yan
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2290 - 2299
  • [36] A Generative Approach to Open Set Recognition Using Distance-Based Probabilistic Anomaly Augmentation
    Goodman, Joel
    Sarkani, Shahram
    Mazzuchi, Thomas
    IEEE ACCESS, 2022, 10 : 42218 - 42228
  • [37] k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint
    Andrzej Młodak
    Journal of Classification, 2021, 38 : 313 - 352
  • [38] Distance-based dynamically adjusted probabilistic forwarding for wireless mobile ad hoc networks
    Khan, Imran Ali
    Javaid, Akmal
    Qian, Hua Lin
    2008 IFIP INTERNATIONAL CONFERENCE ON WIRELESS AND OPTICAL COMMUNICATIONS NETWORKS, 2008, : 117 - +
  • [39] Imprecise Probabilistic Model Updating Using A Wasserstein Distance-based Uncertainty Quantification Metric
    Yang, Lechang
    Han, Dongxu
    Wang, Pidong
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2022, 58 (24): : 300 - 311
  • [40] k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint
    Mlodak, Andrzej
    JOURNAL OF CLASSIFICATION, 2021, 38 (02) : 313 - 352