Locally centred Mahalanobis distance: A new distance measure with salient features towards outlier detection

被引:51
|
作者
Todeschini, Roberto [1 ]
Ballabio, Davide [1 ]
Consonni, Viviana [1 ]
Sahigara, Faizan [1 ]
Filzmoser, Peter [2 ]
机构
[1] Univ Milano Bicocca, Dept Earth & Environm Sci, Milano Chemometr & QSAR Res Grp, I-20126 Milan, Italy
[2] Vienna Univ Technol, Dept Stat & Probabil Theory, A-1040 Vienna, Austria
关键词
Mahalanobis distance; Outlier detection; Similarity; Isolation degree; Remoteness; Covariance matrix; Data mining;
D O I
10.1016/j.aca.2013.04.034
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Outlier detection is a prerequisite to identify the presence of aberrant samples in a given set of data. The identification of such diverse data samples is significant particularly for multivariate data analysis where increasing data dimensionality can easily hinder the data exploration and such outliers often go undetected. This paper is aimed to introduce a novel Mahalanobis distance measure (namely, a pseudo-distance) termed as locally centred Mahalanobis distance, derived by centering the covariance matrix at each data sample rather than at the data centroid as in the classical covariance matrix. Two parameters, called as Remoteness and Isolation degree, were derived from the resulting pairwise distance matrix and their salient features facilitated a better identification of atypical samples isolated from the rest of the data, thus reflecting their potential application towards outlier detection. The Isolation degree demonstrated to be able to detect a new kind of outliers, that is, isolated samples within the data domain, thus resulting in a useful diagnostic tool to evaluate the reliability of predictions obtained by local models (e.g. k-NN models). To better understand the role of Remoteness and Isolation degree in identification of such aberrant data samples, some simulated and published data sets from literature were considered as case studies and the results were compared with those obtained by using Euclidean distance and classical Mahalanobis distance. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [1] Outlier detection in cylindrical data based on Mahalanobis distance
    Dhamale, Prashant S.
    Kashikar, Akanksha S.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2025, 54 (02) : 331 - 341
  • [2] An Outlier Detection Method Based on Mahalanobis Distance for Source Localization
    Yan, Qingli
    Chen, Jianfeng
    De Strycker, Lieven
    SENSORS, 2018, 18 (07)
  • [3] A novel spatial outlier detection algorithm based on Mahalanobis distance
    Wen, Junhao
    Wu, Hongyan
    Wu, Zhongfu
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 3574 - 3577
  • [4] Outlier Detection Algorithm based on Mahalanobis Distance for Wireless Sensor Networks
    Titouna, Chafiq
    Titouna, Faiza
    Ari, Ado Adamou Abba
    2019 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI - 2019), 2019,
  • [5] Efficient Implementation of Mahalanobis Distance on Ferroelectric FinFET Crossbar for Outlier Detection
    Rafiq, Musaib
    Chauhan, Yogesh Singh
    Sahay, Shubham
    IEEE JOURNAL OF THE ELECTRON DEVICES SOCIETY, 2024, 12 : 516 - 524
  • [6] Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators
    Cabana, Elisa
    Lillo, Rosa E.
    Laniado, Henry
    STATISTICAL PAPERS, 2021, 62 (04) : 1583 - 1609
  • [7] Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators
    Elisa Cabana
    Rosa E. Lillo
    Henry Laniado
    Statistical Papers, 2021, 62 : 1583 - 1609
  • [8] Outlier detection with Mahalanobis square distance: incorporating small sample correction factor
    Ekiz, Meltem
    Ekiz, O. Ufuk
    JOURNAL OF APPLIED STATISTICS, 2017, 44 (13) : 2444 - 2457
  • [9] Mahalanobis Distance Based Multivariate Outlier Detection to Improve Performance of Hypertension Prediction
    Khongorzul Dashdondov
    Mi-Hye Kim
    Neural Processing Letters, 2023, 55 : 265 - 277
  • [10] Mahalanobis Distance Based Multivariate Outlier Detection to Improve Performance of Hypertension Prediction
    Dashdondov, Khongorzul
    Kim, Mi-Hye
    NEURAL PROCESSING LETTERS, 2023, 55 (01) : 265 - 277