Outlier detection method based on improved DPC algorithm and centrifugal factor

被引:0
|
作者
Xia, Hao [1 ]
Zhou, Yu [1 ]
Li, Jiguang [2 ]
Yue, Xuezhen [1 ]
Li, Jichun [3 ]
机构
[1] North China Univ Water Resources & Elect Power, Sch Elect Engn, Zhengzhou 450045, Peoples R China
[2] Univ Salford, Sch Sci Engn & Environm, Salford M5 4NT, England
[3] Newcastle Univ, Sch Comp, Newcastle Upon Tyne NE4 5TG, England
基金
中国国家自然科学基金;
关键词
Outlier detection; Clustering algorithm; Centrifugal factor; k -nearest neighbor; Local density; Local kernel density;
D O I
10.1016/j.ins.2024.121255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection aims to identify data anomalies exhibiting significant deviations from normal patterns. However, existing outlier detection methods based on k-nearest neighbors often struggle with challenges such as increasing outlier counts and cluster formation issues. Additionally, selecting appropriate nearest-neighbor parameters presents a significant challenge, as researchers commonly evaluate detection accuracy across various k values. To enhance the accuracy and robustness of outlier detection, in this paper we propose an outlier detection method based on the improved DPC algorithm and centrifugal factor. Initially, we leverage k-nearest neighbors, kreciprocal nearest neighbors, and Gaussian kernel function to determine the local density of samples, particularly addressing scenarios where the DPC algorithm struggles to identify cluster centers in sparse clusters. Subsequently, to reduce the DPC algorithm's computational complexity, we screen the samples based on mutual nearest neighbor counts and select cluster centers accordingly. Non-central points are then distributed using k-nearest neighbors, k-reciprocal nearest neighbors, and reverse k-nearest neighbors. The centrifugal factor, whose magnitude reflects the outlier degree of samples, is then computed by calculating the ratio of the local kernel density at the cluster center to that of samples. Finally, we propose a method for choosing the nearest neighbor parameter, k. To comprehensively evaluate the outlier detection performance of the proposed algorithm, we conduct experiments on 12 complex synthetic datasets and 25 public real-world datasets, comparing the results with 12 state-of-the-art outlier detection methods.
引用
收藏
页数:33
相关论文
共 50 条
  • [21] ODRA: an outlier detection algorithm based on relevant attribute analysis method
    Wahid, Abdul
    Rao, Annavarapu Chandra Sekhara
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2021, 24 (01): : 569 - 585
  • [22] ODRA: an outlier detection algorithm based on relevant attribute analysis method
    Abdul Wahid
    Annavarapu Chandra Sekhara Rao
    Cluster Computing, 2021, 24 : 569 - 585
  • [23] Algorithm based on partition for outlier detection
    School of Information Science and Engineering, Northeastern University, Shenyang 110006, China
    不详
    Ruan Jian Xue Bao, 2006, 5 (1009-1016):
  • [24] Cell-based outlier detection algorithm: A fast outlier detection algorithm for large datasets
    Wan, You
    Bian, Fuling
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 1042 - 1048
  • [25] Hardware Trojan Detection Method Based on Enhanced Local Outlier Factor
    Nie, Tingyuan
    Nie, Jingjing
    Zhao, Kun
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2025, E108A (02)
  • [26] A Clustering Algorithm for Tumor Gene Data Based on Improved DPC Algorithm
    Wang W.
    Gao B.
    International Journal Bioautomation, 2022, 26 (02): : 175 - 192
  • [27] Outlier detection based on cluster outlier factor and mutual density
    Zhang Z.
    Zhu M.
    Qiu J.
    Liu C.
    Zhang D.
    Qi J.
    International Journal of Intelligent Information and Database Systems, 2019, 12 (1-2) : 91 - 108
  • [28] Outlier detection based on cluster outlier factor and mutual density
    Zhang Z.
    Qiu J.
    Liu C.
    Zhu M.
    Zhang D.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2019, 25 (09): : 2314 - 2323
  • [29] An Improved Harmonic Current Detection Method Based on Variable Forgetting Factor RLS Algorithm
    Vu, Minh Guang
    Han, Wei
    Wang, Dazhi
    Li, Yunlu
    ADVANCES IN ENERGY SCIENCE AND TECHNOLOGY, PTS 1-4, 2013, 291-294 : 2459 - 2463
  • [30] An incremental outlier factor based clustering algorithm
    Zhou, YF
    Liu, QB
    Deng, S
    Yang, Q
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1358 - 1361