Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance

被引:0
|
作者
Ji Zhang
Hai Wang
机构
[1] Dalhousie University,Faculty of Computer Science
[2] Saint Mary's University,Sobey School of Business
来源
Knowledge and Information Systems | 2006年 / 10卷
关键词
Outlying subspace; High-dimensional data; Outlier detection; Dynamic subspace search;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features) in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and itsknearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing outlier detection methods cannot fulfill this new task effectively.
引用
收藏
页码:333 / 355
页数:22
相关论文
共 50 条
  • [1] Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance
    Zhang, Ji
    Wang, Hai
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (03) : 333 - 355
  • [2] A novel method for detecting outlying subspaces in high-dimensional databases using genetic algorithm
    Zhang, Ji
    Gao, Qigang
    Wang, Hai
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 731 - 740
  • [3] Detecting Outlying Subjects in High-Dimensional Neuroimaging Datasets with Regularized Minimum Covariance Determinant
    Fritsch, Virgile
    Varoquaux, Gael
    Thyreau, Benjamin
    Poline, Jean-Baptiste
    Thirion, Bertrand
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, MICCAI 2011, PT III, 2011, 6893 : 264 - +
  • [4] Detecting and ranking outliers in high-dimensional data
    Kaur, Amardeep
    Datta, Amitava
    INTERNATIONAL JOURNAL OF ADVANCES IN ENGINEERING SCIENCES AND APPLIED MATHEMATICS, 2019, 11 (01) : 75 - 87
  • [5] Detecting and ranking outliers in high-dimensional data
    Amardeep Kaur
    Amitava Datta
    International Journal of Advances in Engineering Sciences and Applied Mathematics, 2019, 11 : 75 - 87
  • [6] On the orthogonal distance to class subspaces for high-dimensional data classification
    Zhu, Rui
    Xue, Jing-Hao
    INFORMATION SCIENCES, 2017, 417 : 262 - 273
  • [7] Hallucinating optimal high-dimensional subspaces
    Arandjelovic, Ognjen
    PATTERN RECOGNITION, 2014, 47 (08) : 2662 - 2672
  • [8] Detecting Projected Outliers in High-Dimensional Data Streams
    Zhang, Ji
    Gao, Qigang
    Wang, Hai
    Liu, Qing
    Xu, Kai
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2009, 5690 : 629 - +
  • [9] High-Dimensional Optimization in Adaptive Random Subspaces
    Lacotte, Jonathan
    Pilanci, Mert
    Pavone, Marco
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional Data
    He, Jingzhu
    Yeh, Chin-Chia Michael
    Wu, Yanhong
    Wang, Liang
    Zhang, Wei
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: APPLIED DATA SCIENCE TRACK, PT IV, 2021, 12978 : 19 - 36