Flexible density peak clustering for real-world data

被引:2
|
作者
Hou, Jian [1 ]
Lin, Houshen [1 ]
Yuan, Huaqiang [1 ]
Pelillo, Marcello [2 ,3 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan 523808, Peoples R China
[2] Ca Foscari Univ, DAIS, I-30172 Venice, Italy
[3] Ca Foscari Univ, European Ctr Living Technol, I-30123 Venice, Italy
基金
中国国家自然科学基金;
关键词
Clustering; Density peak; Real-world data; Number of clusters; FAST SEARCH; K-MEANS; FIND;
D O I
10.1016/j.patcog.2024.110772
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In density based clustering, the density peak algorithm has attracted much attention due to its effectiveness and simplicity, and a vast amount of clustering approaches have been proposed based on this algorithm. Some of these works require manual selection of cluster centers with a decision graph, where human involvement leads to uncertainty in clustering results. In order to avoid human involvement, some other algorithms depend on user-specified number of clusters to determine cluster centers automatically. However, it is well known that accurate estimation of number of clusters is a long-standing difficulty in data clustering. In this paper we present a sequential density peak clustering algorithm to extract clusters one by one, thereby determining the number of clusters automatically and avoiding manual selection of cluster centers in the meanwhile. Starting from a density peak, our algorithm generates an initial cluster surrounding the density peak in the first step, and then obtains the final cluster by expanding the initial cluster based on the relative density relationship among neighboring data points. With a peeling-off strategy, we obtain all the clusters sequentially. Our algorithm works well with clusters of Gaussian distribution and is therefore potential for clustering of real-world data. Experiments with a large number of synthetic and real datasets and comparisons with existing algorithms demonstrate the effectiveness of the proposed algorithm.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Exploiting Clustering for Sports Data Analysis: A Study of Public and Real-World Datasets
    Meyer, Vanessa
    Al-Ghezi, Ahmed
    Wiese, Lena
    MACHINE LEARNING AND DATA MINING FOR SPORTS ANALYTICS, MLSA 2023, 2024, 2035 : 191 - 201
  • [22] Driving Style Analysis by Classifying Real-World Data with Support Vector Clustering
    Feng, Yuxiang
    Pickering, Simon
    Chappell, Edward
    Iravani, Pejman
    Brace, Chris
    2018 3RD IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION ENGINEERING (ICITE), 2018, : 264 - 268
  • [23] Learning With Real-World Data
    不详
    IEEE CONTROL SYSTEMS MAGAZINE, 2023, 43 (05): : 158 - 159
  • [24] Recovering the real-world density and liquidity premia from option data
    Barkhagen, Mathias
    Blomvall, Joergen
    Platen, Eckhard
    QUANTITATIVE FINANCE, 2016, 16 (07) : 1147 - 1164
  • [25] Reliability of real-world data
    Benlidayi, Ilke Coskun
    RHEUMATOLOGY INTERNATIONAL, 2019, 39 (03) : 583 - 584
  • [26] Real-World Data Modeling
    Kotanchek, Mark
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1349 - 1378
  • [27] Real-World Data in Ophthalmology
    Patel, Shriji
    Sternberg, Paul, Jr.
    AMERICAN JOURNAL OF OPHTHALMOLOGY, 2020, 214 : A1 - A2
  • [28] REAL-WORLD DATA MODELING
    Kotanchek, Mark
    GECCO-2010 COMPANION PUBLICATION: PROCEEDINGS OF THE 12TH ANNUAL GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2010, : 2863 - 2895
  • [29] REAL-WORLD DATA MANAGEMENT
    VANRENSSELAER, C
    COMPUTER DECISIONS, 1988, 20 (10): : 50 - 53
  • [30] Reliability of real-world data
    Ilke Coskun Benlidayi
    Rheumatology International, 2019, 39 : 583 - 584