A Probabilistic Transformation of Distance-Based Outliers

被引:7
|
作者
Muhr, David [1 ,2 ]
Affenzeller, Michael [2 ]
Kueng, Josef [3 ]
机构
[1] BMW Grp, A-4400 Steyr, Austria
[2] Johannes Kepler Univ Linz, Inst Formal Models & Verificat, A-4040 Linz, Austria
[3] Johannes Kepler Univ Linz, Inst Applicat Oriented Knowledge Proc, A-4040 Linz, Austria
来源
关键词
anomaly detection; outlier detection; novelty detection; outlier scores; anomaly scores; score normalization; score distribution; score contrast; distance distribution; outlier probabilities; NOVELTY DETECTION; NEIGHBOR; ALGORITHMS;
D O I
10.3390/make5030042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Over a variety of tabular and image benchmark datasets, we show that the probabilistic transformation does not impact outlier ranking (ROC AUC) or detection performance (AP, F1), and increases the contrast between normal and outlier score distributions (statistical distance). The experimental findings indicate that it is possible to transform distance-based outlier scores into interpretable probabilities with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and, because existing distance computations are used, it adds no significant computational overhead.
引用
收藏
页码:782 / 802
页数:21
相关论文
共 50 条
  • [41] Effects of Distance-Based Defer Times and Probabilistic Channel to Time-Stable Geocast
    Kheawchaoom, Phuchong
    Kittipiyakul, Somsak
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL II, 2012, : 895 - 900
  • [42] A Generative Approach to Open Set Recognition Using Distance-Based Probabilistic Anomaly Augmentation
    Goodman, Joel
    Sarkani, Shahram
    Mazzuchi, Thomas
    IEEE Access, 2022, 10 : 42218 - 42228
  • [43] Distance-based shape statistics
    Charpiat, Guillaume
    Faugeras, Olivier
    Keriven, Renaud
    Maurel, Pierre
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 5783 - 5786
  • [44] Distance-Based Sound Separation
    Patterson, Katharine
    Wilson, Kevin
    Wisdom, Scott
    Hershey, John R.
    INTERSPEECH 2022, 2022, : 901 - 905
  • [45] Local distance-based classification
    Laguia, Manuel
    Castro, Juan Luis
    KNOWLEDGE-BASED SYSTEMS, 2008, 21 (07) : 692 - 703
  • [46] Axioms for Distance-Based Centralities
    Skibski, Oskar
    Sosnowska, Jadwiga
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 1218 - 1225
  • [47] Distance-based multilayer perceptrons
    Duch, W
    Adamczak, R
    Diercksen, GHF
    COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION - NEURAL NETWORKS & ADVANCED CONTROL STRATEGIES, 1999, 54 : 75 - 80
  • [48] Distance-based repairs of databases
    Arieli, Ofer
    Denecker, Marc
    Bruynooghe, Maurice
    LOGICS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4160 : 43 - 55
  • [49] Distance-Based Statistical Inference
    Markatou, Marianthi
    Karlis, Dimitrios
    Ding, Yuxin
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 8, 2021, 2021, 8 : 301 - 327
  • [50] Distance-based multimedia indexing
    Data Management and Data Exploration Group, RWTH Aachen University, Germany
    不详
    Adv. Database Technol. - EDBT, 1600, (722-723):