Outlier detection method based on improved DPC algorithm and centrifugal factor

被引:0
|
作者
Xia, Hao [1 ]
Zhou, Yu [1 ]
Li, Jiguang [2 ]
Yue, Xuezhen [1 ]
Li, Jichun [3 ]
机构
[1] North China Univ Water Resources & Elect Power, Sch Elect Engn, Zhengzhou 450045, Peoples R China
[2] Univ Salford, Sch Sci Engn & Environm, Salford M5 4NT, England
[3] Newcastle Univ, Sch Comp, Newcastle Upon Tyne NE4 5TG, England
基金
中国国家自然科学基金;
关键词
Outlier detection; Clustering algorithm; Centrifugal factor; k -nearest neighbor; Local density; Local kernel density;
D O I
10.1016/j.ins.2024.121255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection aims to identify data anomalies exhibiting significant deviations from normal patterns. However, existing outlier detection methods based on k-nearest neighbors often struggle with challenges such as increasing outlier counts and cluster formation issues. Additionally, selecting appropriate nearest-neighbor parameters presents a significant challenge, as researchers commonly evaluate detection accuracy across various k values. To enhance the accuracy and robustness of outlier detection, in this paper we propose an outlier detection method based on the improved DPC algorithm and centrifugal factor. Initially, we leverage k-nearest neighbors, kreciprocal nearest neighbors, and Gaussian kernel function to determine the local density of samples, particularly addressing scenarios where the DPC algorithm struggles to identify cluster centers in sparse clusters. Subsequently, to reduce the DPC algorithm's computational complexity, we screen the samples based on mutual nearest neighbor counts and select cluster centers accordingly. Non-central points are then distributed using k-nearest neighbors, k-reciprocal nearest neighbors, and reverse k-nearest neighbors. The centrifugal factor, whose magnitude reflects the outlier degree of samples, is then computed by calculating the ratio of the local kernel density at the cluster center to that of samples. Finally, we propose a method for choosing the nearest neighbor parameter, k. To comprehensively evaluate the outlier detection performance of the proposed algorithm, we conduct experiments on 12 complex synthetic datasets and 25 public real-world datasets, comparing the results with 12 state-of-the-art outlier detection methods.
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Improved outlier detection and interpretation method for DPC clustering algorithm
    Zhou, Yu
    Xia, Hao
    Pei, Zexuan
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2024, 56 (08): : 68 - 85
  • [2] An outlier detection algorithm based on an integrated outlier factor
    Zhou, Hongfang
    Liu, Hongjiang
    Zhang, Yingjie
    Zhang, Yao
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 975 - 990
  • [3] Outlier detection method based on improved distance
    School of Computer Science and Engineering, South China University of Technology, Guangzhou 510640, China
    Huanan Ligong Daxue Xuebao, 2008, 9 (25-30):
  • [4] An Outlier Detection Method Based on PageRank Algorithm
    Huang, Zhan
    Long, Shun
    Jiang, Yuying
    Chen, Qian
    PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2016, : 331 - 335
  • [5] Outlier detection algorithm based on fast density peak clustering outlier factor
    Zhang, Zhongping
    Li, Sen
    Liu, Weixiong
    Liu, Shuxia
    Tongxin Xuebao/Journal on Communications, 2022, 43 (10): : 186 - 195
  • [6] An Improved KNN Based Outlier Detection Algorithm for Large Datasets
    Wang, Qian
    Zheng, Min
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 585 - 592
  • [7] A New Outlier Detection Algorithm Based on Fast Density Peak Clustering Outlier Factor
    Zhang, ZhongPing
    Li, Sen
    Liu, WeiXiong
    Wang, Ying
    Li, Daisy Xin
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2023, 19 (02)
  • [8] Improved Method for Noise Detection by DBSCAN and Angle Based Outlier Factor in High Dimensional Datasets
    Tripathy, Sarita
    Sahoo, Laxman
    ICCCE 2019: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND CYBER-PHYSICAL ENGINEERING, 2020, 570 : 213 - 221
  • [9] SDROF: outlier detection algorithm based on relative skewness density ratio outlier factor
    Zhang, Zhongping
    Wang, Kuo
    Dong, Jinyu
    Li, Sen
    APPLIED INTELLIGENCE, 2025, 55 (01)
  • [10] An Improved Outlier Detection Algorithm to Medical Insurance
    Xie, Zhiping
    Li, Xiaoyu
    Wu, Wenyi
    Zhang, Xiaoling
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 436 - 445