GriT-DBSCAN: A spatial clustering algorithm for very large databases

被引:8
|
作者
Huang, Xiaogang [1 ]
Ma, Tiefeng [1 ]
Liu, Conan [2 ]
Liu, Shuangzhe [3 ]
机构
[1] Southwestern Univ Finance & Econ, Sch Stat, Chengdu 611130, Sichuan, Peoples R China
[2] Univ New South Wales, UNSW Business Sch, Sydney, NSW 2052, Australia
[3] Univ Canberra, Fac Sci & Technol, Canberra, ACT 2617, Australia
关键词
DBSCAN; Clustering; Indexing methods; Spatial databases;
D O I
10.1016/j.patcog.2023.109658
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DBSCAN is a fundamental spatial clustering algorithm with numerous practical applications. However, a bottleneck of DBSCAN is its O (n 2 ) worst-case time complexity. To address this limitation, we propose a new grid-based algorithm for exact DBSCAN in Euclidean space called GriT-DBSCAN, which is based on the following two techniques. First, we introduce grid tree to organize the non-empty grids for the purpose of efficient non-empty neighboring grids queries. Second, by utilizing the spatial relationships among points, we propose a technique that iteratively prunes unnecessary distance calculations when determining whether the minimum distance between two sets is less than or equal to a certain threshold. We theoretically demonstrate that GriT-DBSCAN has excellent reliability in terms of time complexity. In addition, we obtain two variants of GriT-DBSCAN by incorporating heuristics, or by combining the second technique with an existing algorithm. Experiments are conducted on both synthetic and real-world data sets to evaluate the efficiency of GriT-DBSCAN and its variants. The results show that our algorithms outperform existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Combining sampling technique with DBSCAN algorithm for clustering large spatial databases
    Zhou, SG
    Zhou, AY
    Cao, J
    Wen, J
    Fan, Y
    Hu, YF
    KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS: CURRENT ISSUES AND NEW APPLICATIONS, 2000, 1805 : 169 - 172
  • [2] Scaling up the DBSCAN Algorithm for Clustering Large Spatial Databases Based on Sampling Technique
    Guan Ji hong 1
    2.State Key Laboratory of Software Engineering
    3.College of Remote Sensin
    Wuhan University Journal of Natural Sciences, 2001, (Z1) : 467 - 473
  • [3] Scaling up the DBSCAN algorithm for clustering large spatial databases based on sampling technique
    Ji-Hong, G.
    Shui-Geng, Z.
    Fu-Ling, B.
    Yan-Xiang, H.
    Wuhan University Journal of Natural Sciences, 2001, 6 (1-2) : 467 - 473
  • [4] Approaches for scaling DBSCAN algorithm to large spatial databases
    Zhou, AY
    Zhou, SG
    Cao, J
    Fan, Y
    Hu, YF
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2000, 15 (06) : 509 - 526
  • [5] Approaches for scaling DBSCAN algorithm to large spatial databases
    Aoying Zhou
    Shuigeng Zhou
    Jing Cao
    Ye Fan
    Yunfa Hu
    Journal of Computer Science and Technology, 2000, 15 : 509 - 526
  • [6] Approaches for Scaling DBSCAN Algorithm to Large Spatial Databases
    周傲英
    周水庚
    曹晶
    范晔
    胡运发
    Journal of Computer Science and Technology, 2000, (06) : 509 - 526
  • [7] WIDE: Clustering algorithm for very large databases
    School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban), 2006, 7 (826-831):
  • [8] Scalable grid-based clustering algorithm for very large spatial databases
    Sun, Yufen
    Lu, Yansheng
    2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 763 - 768
  • [9] A fast parallel clustering algorithm for large spatial databases
    Xu, XW
    Jäger, J
    Kriegel, HP
    DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 3 (03) : 263 - 290
  • [10] A Fast Parallel Clustering Algorithm for Large Spatial Databases
    Xiaowei Xu
    Jochen Jäger
    Hans-Peter Kriegel
    Data Mining and Knowledge Discovery, 1999, 3 : 263 - 290