GriT-DBSCAN: A spatial clustering algorithm for very large databases

被引:8
|
作者
Huang, Xiaogang [1 ]
Ma, Tiefeng [1 ]
Liu, Conan [2 ]
Liu, Shuangzhe [3 ]
机构
[1] Southwestern Univ Finance & Econ, Sch Stat, Chengdu 611130, Sichuan, Peoples R China
[2] Univ New South Wales, UNSW Business Sch, Sydney, NSW 2052, Australia
[3] Univ Canberra, Fac Sci & Technol, Canberra, ACT 2617, Australia
关键词
DBSCAN; Clustering; Indexing methods; Spatial databases;
D O I
10.1016/j.patcog.2023.109658
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DBSCAN is a fundamental spatial clustering algorithm with numerous practical applications. However, a bottleneck of DBSCAN is its O (n 2 ) worst-case time complexity. To address this limitation, we propose a new grid-based algorithm for exact DBSCAN in Euclidean space called GriT-DBSCAN, which is based on the following two techniques. First, we introduce grid tree to organize the non-empty grids for the purpose of efficient non-empty neighboring grids queries. Second, by utilizing the spatial relationships among points, we propose a technique that iteratively prunes unnecessary distance calculations when determining whether the minimum distance between two sets is less than or equal to a certain threshold. We theoretically demonstrate that GriT-DBSCAN has excellent reliability in terms of time complexity. In addition, we obtain two variants of GriT-DBSCAN by incorporating heuristics, or by combining the second technique with an existing algorithm. Experiments are conducted on both synthetic and real-world data sets to evaluate the efficiency of GriT-DBSCAN and its variants. The results show that our algorithms outperform existing algorithms.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Clustering in very large databases based on distance and density
    Weining Qian
    XueQing Gong
    AoYing Zhou
    Journal of Computer Science and Technology, 2003, 18 : 67 - 76
  • [22] Hybridized Fragmentation of Very Large Databases Using Clustering
    Harikumar, Sandhya
    Ramachandran, Raji
    2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
  • [23] Cure: An efficient clustering algorithm for large databases
    Guha, S
    Rastogi, R
    Shim, K
    INFORMATION SYSTEMS, 2001, 26 (01) : 35 - 58
  • [24] Effective clustering algorithm in large transaction databases
    Chen, Ning
    Chen, An
    Zhou, Long-Xiang
    Ruan Jian Xue Bao/Journal of Software, 2001, 12 (04): : 475 - 484
  • [25] An Extended DBSCAN Clustering Algorithm
    Fahim, Ahmed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (03) : 245 - 258
  • [26] Improved DBSCAN clustering algorithm
    Feng, Shao-Rong
    Xiao, Wen-Jun
    Zhongguo Kuangye Daxue Xuebao/Journal of China University of Mining and Technology, 2008, 37 (01): : 105 - 111
  • [27] GF-DBSCAN: A New Efficient and Effective Data Clustering Technique for Large Databases
    Tsai, Cheng-Fa
    Wu, Chien-Tsung
    MUSP '06: PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON MULTIMEDIA SYSTEMS AND SIGNAL PROCESSING, 2009, : 231 - +
  • [28] Research on Parallel Design of DBSCAN Clustering Algorithm in Spatial Data Mining
    Zhou, Gong-jian
    2018 INTERNATIONAL CONFERENCE ON ELECTRICAL, CONTROL, AUTOMATION AND ROBOTICS (ECAR 2018), 2018, 307 : 343 - 348
  • [29] DCAD: a Dual Clustering Algorithm for Distributed Spatial Databases
    Zhou Jiaogen
    Guan Jihong
    Li Pingxiang
    GEO-SPATIAL INFORMATION SCIENCE, 2007, 10 (02) : 137 - 144
  • [30] DCAD:a Dual Clustering Algorithm for Distributed Spatial Databases
    ZHOU Jiaogen GUAN Jihong LI Pingxiang ZHOU Jiaogen
    Geo-Spatial Information Science, 2007, (02) : 137 - 144