A Probabilistic Transformation of Distance-Based Outliers

被引:7
|
作者
Muhr, David [1 ,2 ]
Affenzeller, Michael [2 ]
Kueng, Josef [3 ]
机构
[1] BMW Grp, A-4400 Steyr, Austria
[2] Johannes Kepler Univ Linz, Inst Formal Models & Verificat, A-4040 Linz, Austria
[3] Johannes Kepler Univ Linz, Inst Applicat Oriented Knowledge Proc, A-4040 Linz, Austria
来源
关键词
anomaly detection; outlier detection; novelty detection; outlier scores; anomaly scores; score normalization; score distribution; score contrast; distance distribution; outlier probabilities; NOVELTY DETECTION; NEIGHBOR; ALGORITHMS;
D O I
10.3390/make5030042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The scores of distance-based outlier detection methods are difficult to interpret, and it is challenging to determine a suitable cut-off threshold between normal and outlier data points without additional context. We describe a generic transformation of distance-based outlier scores into interpretable, probabilistic estimates. The transformation is ranking-stable and increases the contrast between normal and outlier data points. Determining distance relationships between data points is necessary to identify the nearest-neighbor relationships in the data, yet most of the computed distances are typically discarded. We show that the distances to other data points can be used to model distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into outlier probabilities. Over a variety of tabular and image benchmark datasets, we show that the probabilistic transformation does not impact outlier ranking (ROC AUC) or detection performance (AP, F1), and increases the contrast between normal and outlier score distributions (statistical distance). The experimental findings indicate that it is possible to transform distance-based outlier scores into interpretable probabilities with increased contrast between normal and outlier samples. Our work generalizes to a wide range of distance-based outlier detection methods, and, because existing distance computations are used, it adds no significant computational overhead.
引用
收藏
页码:782 / 802
页数:21
相关论文
共 50 条
  • [1] Distance-based outliers in sequences
    Palshikar, GK
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2005, 3816 : 547 - 552
  • [2] Reducing distance computations for distance-based outliers
    Angiulli, Fabrizio
    Basta, Stefano
    Lodi, Stefano
    Sartori, Claudio
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 147
  • [3] Distance-based outliers: algorithms and applications
    Knorr, EM
    Ng, RT
    Tucakov, V
    VLDB JOURNAL, 2000, 8 (3-4): : 237 - 253
  • [4] Distance-based detection and prediction of outliers
    Angiulli, F
    Basta, S
    Pizzuti, C
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) : 145 - 160
  • [5] Improving prediction of distance-based outliers
    Angiulli, F
    Basta, S
    Pizzuti, C
    DISCOVERY SCIENCE, PROCEEDINGS, 2004, 3245 : 89 - 100
  • [6] Distance-based outliers: algorithms and applications
    Edwin M. Knorr
    Raymond T. Ng
    Vladimir Tucakov
    The VLDB Journal, 2000, 8 : 237 - 253
  • [7] Research on algorithms for mining distance-based outliers
    Wang, LZ
    Zou, LK
    CHINESE JOURNAL OF ELECTRONICS, 2005, 14 (03): : 485 - 490
  • [8] Finding intensional knowledge of distance-based outliers
    Knorr, EM
    Ng, RT
    PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 211 - 222
  • [9] An improved distance-based outliers detection algorithm
    Tian, Sheng-wen
    Huang, Ming-ming
    General System and Control System, Vol I, 2007, : 270 - 273
  • [10] Fast mining of distance-based outliers in metric space
    State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China
    Zhejiang Daxue Xuebao (Gongxue Ban), 2009, 2 (297-302):