A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

被引:0
|
作者
Anna Koufakou
Michael Georgiopoulos
机构
[1] Florida Gulf Coast University,U.A. Whitaker School of Engineering
[2] University of Central Florida,School of EECS
来源
关键词
Outlier detection; Anomaly detection; Data mining; Distributed data sets; Mixed attribute data sets; High-dimensional data sets;
D O I
暂无
中图分类号
学科分类号
摘要
Outlier detection has attracted substantial attention in many applications and research areas; some of the most prominent applications are network intrusion detection or credit card fraud detection. Many of the existing approaches are based on calculating distances among the points in the dataset. These approaches cannot easily adapt to current datasets that usually contain a mix of categorical and continuous attributes, and may be distributed among different geographical locations. In addition, current datasets usually have a large number of dimensions. These datasets tend to be sparse, and traditional concepts such as Euclidean distance or nearest neighbor become unsuitable. We propose a fast distributed outlier detection strategy intended for datasets containing mixed attributes. The proposed method takes into consideration the sparseness of the dataset, and is experimentally shown to be highly scalable with the number of points and the number of attributes in the dataset. Experimental results show that the proposed outlier detection method compares very favorably with other state-of-the art outlier detection strategies proposed in the literature and that the speedup achieved by its distributed version is very close to linear.
引用
收藏
页码:259 / 289
页数:30
相关论文
共 50 条
  • [31] Local projections for high-dimensional outlier detection
    Thomas Ortner
    Peter Filzmoser
    Maia Rohm
    Sarka Brodinova
    Christian Breiteneder
    METRON, 2021, 79 : 189 - 206
  • [32] Local projections for high-dimensional outlier detection
    Ortner, Thomas
    Filzmoser, Peter
    Rohm, Maia
    Brodinova, Sarka
    Breiteneder, Christian
    METRON-INTERNATIONAL JOURNAL OF STATISTICS, 2021, 79 (02): : 189 - 206
  • [33] Outlier detection in high-dimensional regression model
    Wang, Tao
    Li, Zhonghua
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (14) : 6947 - 6958
  • [34] OUTLIER DETECTION WITH ENHANCED ANGLE-BASED OUTLIER FACTOR IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Tian, Hao
    Li, Simin
    Zou, Fengbo
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2018, 14 (05): : 1633 - 1651
  • [35] Anomaly detection in mixed high-dimensional molecular data
    Buck, Lena
    Schmidt, Tobias
    Feist, Maren
    Schwarzfischer, Philipp
    Kube, Dieter
    Oefner, Peter J.
    Zacharias, Helena U.
    Altenbuchinger, Michael
    Dettmer, Katja
    Gronwald, Wolfram
    Spang, Rainer
    BIOINFORMATICS, 2023, 39 (08)
  • [36] Fast approximate similarity search in extremely high-dimensional data sets
    Houle, ME
    Sakuma, J
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 619 - 630
  • [37] Very Fast Interactive Visualization of Large Sets of High-dimensional Data
    Dzwinel, Witold
    Wcislo, Rafal
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 572 - 581
  • [38] Weighted Outlier Detection of High-Dimensional Categorical Data Using Feature Grouping
    Li, Junli
    Zhang, Jifu
    Pang, Ning
    Qin, Xiao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4295 - 4308
  • [39] IPMOD: An efficient outlier detection model for high-dimensional medical data streams
    Yang, Yun
    Fan, ChongJun
    Chen, Liang
    Xiong, HongLin
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [40] Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm
    Tsagris, Michail
    Papadakis, Manos
    Alenazi, Abdulaziz
    Alzeley, Omar
    COMPUTATION, 2024, 12 (09)