A COMPARATIVE STUDY FOR OUTLIER DETECTION METHODS IN HIGH DIMENSIONAL TEXT DATA

被引:5
|
作者
Park, Cheong Hee [1 ]
机构
[1] Chungnam Natl Univ, Dept Comp Sci & Engn, 220 Gung Dong, Daejeon 305763, South Korea
关键词
Curse of dimensionality; Dimension reduction; High dimensional text data; Outlier detection; KURTOSIS;
D O I
10.2478/jaiscr-2023-0001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection aims to find a data sample that is significantly different from other data samples. Various outlier detection methods have been proposed and have been shown to be able to detect anomalies in many practical problems. However, in high dimensional data, conventional outlier detection methods often behave unexpectedly due to a phenomenon called the curse of dimensionality. In this paper, we compare and analyze outlier detection performance in various experimental settings, focusing on text data with dimensions typically in the tens of thousands. Experimental setups were simulated to compare the performance of outlier detection methods in unsupervised versus semi-supervised mode and uni-modal versus multi-modal data distributions. The performance of outlier detection methods based on dimension reduction is compared, and a discussion on using k-NN distance in high dimensional data is also provided. Analysis through experimental comparison in various environments can provide insights into the application of outlier detection methods in high dimensional data.
引用
收藏
页码:5 / 17
页数:13
相关论文
共 50 条
  • [41] OUTLIER DETECTION BASED ON DENSITY OF HYPERCUBE IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Zou, Fengbo
    Li, Simin
    Lu, Xianying
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (03): : 873 - 889
  • [42] Variational autoencoder-based outlier detection for high-dimensional data
    Li, Yongmou
    Wang, Yijie
    Ma, Xingkong
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 991 - 1002
  • [43] A LoOP based outlier detection method for high dimensional fuzzy data set
    Jahromi, Alireza Fakharzadeh
    Zarei, Fateme
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (01) : 241 - 248
  • [44] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Guanglei Meng
    Biao Wang
    Yanming Wu
    Mingzhe Zhou
    Tiankuo Meng
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718
  • [45] A Comparative Study of Cluster Based Outlier Detection, Distance Based Outlier Detection and Density Based Outlier Detection Techniques
    Mandhare, Harshada C.
    Idate, S. R.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 931 - 935
  • [46] Outlier detection with data mining techniques and statistical methods
    Orellana, Marcos
    Cedillo, Priscila
    ENFOQUE UTE, 2020, 11 (01): : 56 - 67
  • [47] Outlier detection for compositional data using robust methods
    Filzmoser, Peter
    Hron, Karel
    MATHEMATICAL GEOSCIENCES, 2008, 40 (03) : 233 - 248
  • [48] A comparison of multiple outlier detection methods for regression data
    Billor, Nedret
    Kiral, Gulsen
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2008, 37 (03) : 521 - 545
  • [49] Robust Multivariate Outlier Detection Methods for Environmental Data
    Alameddine, Ibrahim
    Kenney, Melissa A.
    Gosnell, Russell J.
    Reckhow, Kenneth H.
    JOURNAL OF ENVIRONMENTAL ENGINEERING-ASCE, 2010, 136 (11): : 1299 - 1304
  • [50] Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems
    Boudjeloud-Assala, Lydia
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2012, 4 (01) : 6 - 13