Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm

被引：0

作者：

Tsagris, Michail ^{[1
]}

Papadakis, Manos ^{[2
]}

Alenazi, Abdulaziz ^{[2
]}

Alzeley, Omar ^{[3
]}

机构：

[1] Univ Crete, Dept Econ, Gallos Campus, Rethimnon 74100, Greece

[2] Northern Border Univ, Coll Sci, Dept Math, Ar Ar 73213, Saudi Arabia

[3] Umm Al Qura Univ, Al Qunfudah Univ Coll, Dept Math, Mecca 24382, Saudi Arabia

来源：

COMPUTATION | 2024年 / 12卷 / 09期

关键词：

high-dimensional data; outliers; computational efficiency; 6208;

D O I：

10.3390/computation12090185

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Outlier detection, or anomaly detection as it is known in the machine learning community, has gained interest in recent years, and it is commonly used when the sample size is smaller than the number of variables. In 2015, an outlier detection procedure was proposed 7 for this high-dimensional setting, replacing the classic minimum covariance determinant estimator with the minimum diagonal product estimator. Computationally speaking, their method has two drawbacks: (a) it is not computationally efficient and does not scale up, and (b) it is not memory efficient and, in some cases, it is not possible to apply due to memory limits. We address the first issue via efficient code written in both R and C++, whereas for the second issue, we utilize the eigen decomposition and its properties. Experiments are conducted using simulated data to showcase the time improvement, while gene expression data are used to further examine some extra practicalities associated with the algorithm. The simulation studies yield a speed-up factor that ranges between 17 and 1800, implying a successful reduction in the estimator's computational burden.

引用

页数：10

共 50 条

[41] A Method for Measurement Data Modeling and High-Dimensional Outlier Detection Based on Large Dimensional Matrix
Chen, Gang
Fan, Huanhuan
An, Baoran
PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 2274 - 2279
[42] Adaptive Clustering for Outlier Identification in High-Dimensional Data
Thudumu, Srikanth
Branch, Philip
Jin, Jiong
Singh, Jugdutt
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 215 - 228
[43] Outlier mining in large high-dimensional data sets
Angiulli, F
Pizzuti, C
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) : 203 - 215
[44] Outlier Detection in High Dimensional Data
Kamalov, Firuz
Leung, Ho Hon
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2020, 19 (01)
[45] Outlier detection for high dimensional data
Aggarwal, CC
Yu, PS
SIGMOD RECORD, 2001, 30 (02) : 37 - 46
[46] Fast outlier detection algorithm for high dimensional categorical data streams
Zhou, Xiao-Yun
Sun, Zhi-Hui
Zhang, Bai-Li
Yang, Yi-Dong
Ruan Jian Xue Bao/Journal of Software, 2007, 18 (04): : 933 - 942
[47] A NOVEL TENSOR ALGEBRAIC APPROACH FOR HIGH-DIMENSIONAL OUTLIER DETECTION UNDER DATA MISALIGNMENT
Fan, Bo
Zhang, Zemin
Aeron, Shuchin
2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3628 - 3632
[48] Outlier detection for high dimensional data using the Comedian approach
Sajesh, T. A.
Srinivasan, M. R.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2012, 82 (05) : 745 - 757
[49] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
Anna Koufakou
Michael Georgiopoulos
Data Mining and Knowledge Discovery, 2010, 20 : 259 - 289
[50] A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
Koufakou, Anna
Georgiopoulos, Michael
DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 20 (02) : 259 - 289

← 1 2 3 4 5 →