Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm

被引:0
|
作者
Tsagris, Michail [1 ]
Papadakis, Manos [2 ]
Alenazi, Abdulaziz [2 ]
Alzeley, Omar [3 ]
机构
[1] Univ Crete, Dept Econ, Gallos Campus, Rethimnon 74100, Greece
[2] Northern Border Univ, Coll Sci, Dept Math, Ar Ar 73213, Saudi Arabia
[3] Umm Al Qura Univ, Al Qunfudah Univ Coll, Dept Math, Mecca 24382, Saudi Arabia
关键词
high-dimensional data; outliers; computational efficiency; 6208;
D O I
10.3390/computation12090185
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Outlier detection, or anomaly detection as it is known in the machine learning community, has gained interest in recent years, and it is commonly used when the sample size is smaller than the number of variables. In 2015, an outlier detection procedure was proposed 7 for this high-dimensional setting, replacing the classic minimum covariance determinant estimator with the minimum diagonal product estimator. Computationally speaking, their method has two drawbacks: (a) it is not computationally efficient and does not scale up, and (b) it is not memory efficient and, in some cases, it is not possible to apply due to memory limits. We address the first issue via efficient code written in both R and C++, whereas for the second issue, we utilize the eigen decomposition and its properties. Experiments are conducted using simulated data to showcase the time improvement, while gene expression data are used to further examine some extra practicalities associated with the algorithm. The simulation studies yield a speed-up factor that ranges between 17 and 1800, implying a successful reduction in the estimator's computational burden.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Subspace rotations for high-dimensional outlier detection
    Chung, Hee Cheol
    Ahn, Jeongyoun
    JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 183
  • [32] Local projections for high-dimensional outlier detection
    Thomas Ortner
    Peter Filzmoser
    Maia Rohm
    Sarka Brodinova
    Christian Breiteneder
    METRON, 2021, 79 : 189 - 206
  • [33] Local projections for high-dimensional outlier detection
    Ortner, Thomas
    Filzmoser, Peter
    Rohm, Maia
    Brodinova, Sarka
    Breiteneder, Christian
    METRON-INTERNATIONAL JOURNAL OF STATISTICS, 2021, 79 (02): : 189 - 206
  • [34] Outlier detection in high-dimensional regression model
    Wang, Tao
    Li, Zhonghua
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (14) : 6947 - 6958
  • [35] OUTLIER DETECTION WITH ENHANCED ANGLE-BASED OUTLIER FACTOR IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Tian, Hao
    Li, Simin
    Zou, Fengbo
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2018, 14 (05): : 1633 - 1651
  • [36] COMPUTATIONALLY EFFICIENT ALGORITHMS FOR HIGH-DIMENSIONAL ROBUST ESTIMATORS
    MOUNT, DM
    NETANYAHU, NS
    CVGIP-GRAPHICAL MODELS AND IMAGE PROCESSING, 1994, 56 (04): : 289 - 303
  • [37] An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data
    Hoang Vu Nguyen
    Gopalkrishnan, Vivekanand
    Assent, Ira
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, 2011, 6587 : 138 - +
  • [38] Projected outlier detection in high-dimensional mixed-attributes data set
    Ye, Mao
    Li, Xue
    Orlowska, Maria E.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 7104 - 7113
  • [39] Feature Extraction for Outlier Detection in High-Dimensional Spaces
    Hoang Vu Nguyen
    Gopalkrishnan, Vivekanand
    PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON FEATURE SELECTION IN DATA MINING, 2010, 10 : 66 - 75
  • [40] A new algorithm for high-dimensional outlier detection based on constrained particle swarm intelligence
    Ye, Dongyi
    Chen, Zhaojiong
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2008, 5009 : 516 - 523