Incremental Shared Nearest Neighbor Density-Based Clustering

被引:11
|
作者
Singh, Sumeet [1 ]
Awekar, Amit [1 ]
机构
[1] Indian Inst Technol, Gauhati, India
关键词
Incremental clustering; Graph-based clustering; Shared Nearest Neighbor; Density-based clustering; Dynamic dataset;
D O I
10.1145/2505515.2507837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Shared Nearest Neighbor Density-based clustering (SNN-DBSCAN) is a robust graph-based clustering algorithm and has wide applications from climate data analysis to network intrusion detection. We propose an incremental extension to this algorithm IncSNN-DBSCAN, capable of finding clusters on a dataset to which frequent inserts are made. For each data point, the algorithm maintains four properties: nearest neighbor list, strengths of shared links, total connection strength and topic property. Algorithm only targets points that undergo change to their properties. We prove that, to obtain the exact clustering it is sufficient to re-compute properties for only the targeted points, followed by possible cluster mergers on newly formed links and cluster splits on the deleted links. Experiments on KDD Cup 1999 and Mopsi search engine 2012 datasets respectively demonstrate 75% and 99% reduction in the size of the set of points involved in property recomputations. By avoiding most of the redundant property computations, algorithm generates speedup up to 250 and 1000 times respectively on those datasets, while generating the exact same clustering as the non-incremental algorithm. We experimentally verify our claim for up to 2500 inserts on both datasets. However, speedup comes at the cost of up to 48 times more memory usage.
引用
收藏
页码:1533 / 1536
页数:4
相关论文
共 50 条
  • [1] Batch Incremental Shared Nearest Neighbor Density Based Clustering Algorithm for Dynamic Datasets
    Bhattacharjee, Panthadeep
    Awekar, Amit
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 568 - 574
  • [2] An Improved Clustering Algorithm Based on Density and Shared Nearest Neighbor
    Ye, Hanmin
    Lv, Hao
    Sun, Qianting
    2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 37 - 40
  • [3] Nearest neighbor - density-based clustering methods for large hyperspectral images
    Cariou, Claude
    Chehdi, Kacem
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXIII, 2017, 10427
  • [4] A novel density-based clustering algorithm using nearest neighbor graph
    Li, Hao
    Liu, Xiaojie
    Li, Tao
    Gan, Rundong
    PATTERN RECOGNITION, 2020, 102
  • [5] A dynamic density-based clustering method based on K-nearest neighbor
    Sorkhi, Mahshid Asghari
    Akbari, Ebrahim
    Rabbani, Mohsen
    Motameni, Homayun
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (05) : 3005 - 3031
  • [6] A dynamic density-based clustering method based on K-nearest neighbor
    Mahshid Asghari Sorkhi
    Ebrahim Akbari
    Mohsen Rabbani
    Homayun Motameni
    Knowledge and Information Systems, 2024, 66 : 3005 - 3031
  • [7] IMPROVED NEAREST NEIGHBOR DENSITY-BASED CLUSTERING TECHNIQUES WITH APPLICATION TO HYPERSPECTRAL IMAGES
    Cariou, Claude
    Chehdi, Kacem
    Le Moan, Steven
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4127 - 4131
  • [8] Evolving data stream clustering algorithm based on the shared nearest neighbor density
    Gao, Bing, 1703, University of Science and Technology Beijing (36):
  • [9] LDBNN: A Local Density-based Nearest Neighbor Classifier
    Carbonera, Joel Luis
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 395 - 401
  • [10] RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates
    Bryant, Avory
    Cios, Krzysztof
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (06) : 1109 - 1121