Incremental Shared Nearest Neighbor Density-Based Clustering

被引:11
|
作者
Singh, Sumeet [1 ]
Awekar, Amit [1 ]
机构
[1] Indian Inst Technol, Gauhati, India
关键词
Incremental clustering; Graph-based clustering; Shared Nearest Neighbor; Density-based clustering; Dynamic dataset;
D O I
10.1145/2505515.2507837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Shared Nearest Neighbor Density-based clustering (SNN-DBSCAN) is a robust graph-based clustering algorithm and has wide applications from climate data analysis to network intrusion detection. We propose an incremental extension to this algorithm IncSNN-DBSCAN, capable of finding clusters on a dataset to which frequent inserts are made. For each data point, the algorithm maintains four properties: nearest neighbor list, strengths of shared links, total connection strength and topic property. Algorithm only targets points that undergo change to their properties. We prove that, to obtain the exact clustering it is sufficient to re-compute properties for only the targeted points, followed by possible cluster mergers on newly formed links and cluster splits on the deleted links. Experiments on KDD Cup 1999 and Mopsi search engine 2012 datasets respectively demonstrate 75% and 99% reduction in the size of the set of points involved in property recomputations. By avoiding most of the redundant property computations, algorithm generates speedup up to 250 and 1000 times respectively on those datasets, while generating the exact same clustering as the non-incremental algorithm. We experimentally verify our claim for up to 2500 inserts on both datasets. However, speedup comes at the cost of up to 48 times more memory usage.
引用
收藏
页码:1533 / 1536
页数:4
相关论文
共 50 条
  • [21] An Incremental Density-Based Clustering Technique for Large Datasets
    Rehman, Saif Ur
    Khan, Muhammed Naeem Ahmed
    COMPUTATIONAL INTELLIGENCE IN SECURITY FOR INFORMATION SYSTEMS 2010, 2010, 85 : 3 - 11
  • [22] An incremental density-based clustering framework using fuzzy local clustering
    Laohakiat, Sirisup
    Sa-ing, Vera
    INFORMATION SCIENCES, 2021, 547 : 404 - 426
  • [23] Improving K-Nearest Neighbor Approaches for Density-Based Pixel Clustering in Hyperspectral Remote Sensing Images
    Cariou, Claude
    Le Moan, Steven
    Chehdi, Kacem
    REMOTE SENSING, 2020, 12 (22) : 1 - 29
  • [24] Application of unsupervised nearest-neighbor density-based approaches to sequential dimensionality reduction and clustering of hyperspectral images
    Cariou, Claude
    Chehdi, Kacem
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXIV, 2018, 10789
  • [25] Scalable Parallel Algorithms for Shared Nearest Neighbor Clustering
    Kumari, Sonal
    Maurya, Saurabh
    Goyal, Poonam
    Balasubramaniam, Sundar S.
    Goyal, Navneet
    PROCEEDINGS OF 2016 IEEE 23RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2016, : 72 - 81
  • [26] Efficient incremental density-based algorithm for clustering large datasets
    Bakr, Ahmad M.
    Ghanem, Nagia M.
    Ismail, Mohamed A.
    ALEXANDRIA ENGINEERING JOURNAL, 2015, 54 (04) : 1147 - 1154
  • [27] An efficient automated incremental density-based algorithm for clustering and classification
    Azhir, Elham
    Navimipour, Nima Jafari
    Hosseinzadeh, Mehdi
    Sharifi, Arash
    Darwesh, Aso
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 (114): : 665 - 678
  • [28] DeltaDens - Incremental Algorithm for On-Line Density-Based Clustering
    Ziembinski, Radoslaw Z.
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, 2013, 185 : 163 - 172
  • [29] Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning
    Xiaoshu Zhu
    Jie Zhang
    Yunpei Xu
    Jianxin Wang
    Xiaoqing Peng
    Hong-Dong Li
    Interdisciplinary Sciences: Computational Life Sciences, 2020, 12 : 117 - 130
  • [30] Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning
    Zhu, Xiaoshu
    Zhang, Jie
    Xu, Yunpei
    Wang, Jianxin
    Peng, Xiaoqing
    Li, Hong-Dong
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2020, 12 (02) : 117 - 130