A COMPARATIVE STUDY FOR OUTLIER DETECTION METHODS IN HIGH DIMENSIONAL TEXT DATA

被引:5
|
作者
Park, Cheong Hee [1 ]
机构
[1] Chungnam Natl Univ, Dept Comp Sci & Engn, 220 Gung Dong, Daejeon 305763, South Korea
关键词
Curse of dimensionality; Dimension reduction; High dimensional text data; Outlier detection; KURTOSIS;
D O I
10.2478/jaiscr-2023-0001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection aims to find a data sample that is significantly different from other data samples. Various outlier detection methods have been proposed and have been shown to be able to detect anomalies in many practical problems. However, in high dimensional data, conventional outlier detection methods often behave unexpectedly due to a phenomenon called the curse of dimensionality. In this paper, we compare and analyze outlier detection performance in various experimental settings, focusing on text data with dimensions typically in the tens of thousands. Experimental setups were simulated to compare the performance of outlier detection methods in unsupervised versus semi-supervised mode and uni-modal versus multi-modal data distributions. The performance of outlier detection methods based on dimension reduction is compared, and a discussion on using k-NN distance in high dimensional data is also provided. Analysis through experimental comparison in various environments can provide insights into the application of outlier detection methods in high dimensional data.
引用
收藏
页码:5 / 17
页数:13
相关论文
共 50 条
  • [31] A Comparative Study of Outlier Detection Algorithms
    Isaksson, Charlie
    Dunham, Margaret H.
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 440 - 453
  • [32] Discussion of Outlier Detection Methods of Purchasing Data
    Kono, Katsuya
    Yamamoto, Yoshiro
    2016 14TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2016, : 12 - 18
  • [33] WMEVF: AN OUTLIER DETECTION METHODS FOR CATEGORICAL DATA
    Rokhman, Nur
    Subanar
    Winarko, Edi
    2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 37 - 42
  • [34] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Meng, Guanglei
    Wang, Biao
    Wu, Yanming
    Zhou, Mingzhe
    Meng, Tiankuo
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3705 - 3718
  • [35] Unsupervised Artificial Neural Networks for Outlier Detection in High-Dimensional Data
    Popovic, Daniel
    Fouche, Edouard
    Boehm, Klemens
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 11695 : 3 - 19
  • [36] Fast outlier detection for high-dimensional data of wireless sensor networks
    Qiao, Yan
    Cui, Xinhong
    Jin, Peng
    Zhang, Wu
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2020, 16 (10)
  • [37] Binary Gravitational Subspace Search for Outlier Detection in High Dimensional Data Streams
    Souiden, Imen
    Brahmi, Zaki
    Omri, Mohamed Nazih
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 157 - 169
  • [38] Randomized Robust Subspace Recovery and Outlier Detection for High Dimensional Data Matrices
    Rahmani, Mostafa
    Atia, George K.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (06) : 1580 - 1594
  • [39] Visual interactive evolutionary algorithm for high dimensional data clustering and outlier detection
    Boudjeloud, L
    Poulet, F
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 426 - 431
  • [40] Ordinal Outlier Algorithm for Anomaly Detection of High-Dimensional Data Sets
    Chen, Gang
    Du, Linlin
    An, Baoran
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 5356 - 5361