Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm

被引:0
|
作者
Tsagris, Michail [1 ]
Papadakis, Manos [2 ]
Alenazi, Abdulaziz [2 ]
Alzeley, Omar [3 ]
机构
[1] Univ Crete, Dept Econ, Gallos Campus, Rethimnon 74100, Greece
[2] Northern Border Univ, Coll Sci, Dept Math, Ar Ar 73213, Saudi Arabia
[3] Umm Al Qura Univ, Al Qunfudah Univ Coll, Dept Math, Mecca 24382, Saudi Arabia
关键词
high-dimensional data; outliers; computational efficiency; 6208;
D O I
10.3390/computation12090185
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Outlier detection, or anomaly detection as it is known in the machine learning community, has gained interest in recent years, and it is commonly used when the sample size is smaller than the number of variables. In 2015, an outlier detection procedure was proposed 7 for this high-dimensional setting, replacing the classic minimum covariance determinant estimator with the minimum diagonal product estimator. Computationally speaking, their method has two drawbacks: (a) it is not computationally efficient and does not scale up, and (b) it is not memory efficient and, in some cases, it is not possible to apply due to memory limits. We address the first issue via efficient code written in both R and C++, whereas for the second issue, we utilize the eigen decomposition and its properties. Experiments are conducted using simulated data to showcase the time improvement, while gene expression data are used to further examine some extra practicalities associated with the algorithm. The simulation studies yield a speed-up factor that ranges between 17 and 1800, implying a successful reduction in the estimator's computational burden.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Manifold-based denoising, outlier detection, and dimension reduction algorithm for high-dimensional data
    Zhao, Guanghua
    Yang, Tao
    Fu, Dongmei
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3923 - 3942
  • [22] Manifold-based denoising, outlier detection, and dimension reduction algorithm for high-dimensional data
    Guanghua Zhao
    Tao Yang
    Dongmei Fu
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3923 - 3942
  • [23] Outlier Detection Using Structural Scores in a High-Dimensional Space
    Li, Xiaojie
    Lv, Jiancheng
    Yi, Zhang
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (05) : 2302 - 2310
  • [24] On eigenfunction approach to data mining: outlier detection in high-dimensional data sets
    Nagar, AK
    Muyeba, MK
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 251 - 256
  • [25] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Meng, Guanglei
    Wang, Biao
    Wu, Yanming
    Zhou, Mingzhe
    Meng, Tiankuo
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3705 - 3718
  • [26] Unsupervised Artificial Neural Networks for Outlier Detection in High-Dimensional Data
    Popovic, Daniel
    Fouche, Edouard
    Boehm, Klemens
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 11695 : 3 - 19
  • [27] Fast outlier detection for high-dimensional data of wireless sensor networks
    Qiao, Yan
    Cui, Xinhong
    Jin, Peng
    Zhang, Wu
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2020, 16 (10)
  • [28] OUTLIER DETECTION BASED ON DENSITY OF HYPERCUBE IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Zou, Fengbo
    Li, Simin
    Lu, Xianying
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (03): : 873 - 889
  • [29] Variational autoencoder-based outlier detection for high-dimensional data
    Li, Yongmou
    Wang, Yijie
    Ma, Xingkong
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 991 - 1002
  • [30] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Guanglei Meng
    Biao Wang
    Yanming Wu
    Mingzhe Zhou
    Tiankuo Meng
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718