GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

被引:9
|
作者
Mansoori, Eghbal G. [1 ]
机构
[1] Shiraz Univ, Sch Elect & Comp Engn, Shiraz, Iran
关键词
Grid-based clustering; Hierarchical clustering; Feature selection; High-dimensional data;
D O I
10.1007/s00500-013-1105-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a grid-based hierarchical clustering algorithm (GACH) as an efficient and robust method to explore clusters in high-dimensional data with no prior knowledge. It discovers the initial positions of the potential clusters automatically and then combines them hierarchically to obtain the final clusters. In this regard, GACH first projects the data patterns on a two-dimensional space (i.e., on a plane established by two features) to overcome the curse of dimensionality problem in high-dimensional data. To choose these two well-informed features, a simple and fast feature selection algorithm is proposed. Then, through meshing the plane with grid lines, GACH detects the crowded grid points. The nearest data patterns around these grid points are considered as initial members of some potential clusters. By returning the patterns back to their true dimensions, GACH refines these clusters. In the merging phase, GACH combines the closely adjacent clusters in a hierarchical bottom-up manner to construct the final clusters' members. The main features of GACH are: (1) it automatically discovers the clusters, (2) the obtained clusters are stable, (3) it is efficient for data sets with high dimensions, and (4) its merging process involves a threshold which can be obtained in advance for well-clustered data. To assess our proposed algorithm, it is applied on some benchmark data sets and the validity of obtained clusters is compared with the results of some other clustering algorithms. This comparison shows that GACH is accurate, efficient and feasible to discover clusters in high-dimensional data.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [41] Grid-based clustering algorithm using fractal dimension
    Xiong, Xiao
    Zhang, Jie
    Journal of Information and Computational Science, 2007, 4 (03): : 997 - 1002
  • [42] A grid-based clustering algorithm for wild bird distribution
    Yuwei Wang
    Yuanchun Zhou
    Ying Liu
    Ze Luo
    Danhuai Guo
    Jing Shao
    Fei Tan
    Liang Wu
    Jianhui Li
    Baoping Yan
    Frontiers of Computer Science, 2013, 7 : 475 - 485
  • [43] Grid-based clustering algorithm for muilti-density
    Qiu, BZ
    Zhang, XZ
    Shen, JY
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 1509 - 1512
  • [44] EDACluster: An evolutionary density and grid-based clustering algorithm
    De Oliveira, Cisar S.
    Godinho, Paulo Igor
    Meiguins, Aruanda S. G.
    Meiguins, Bianchi S.
    Freitas, Alex A.
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 143 - +
  • [45] A grid-based clustering algorithm for network anomaly detection
    Wei, Xiaotao
    Huang, Houkuan
    Tian, Shengfeng
    PROCEEDINGS OF THE FIRST INTERNATIONAL SYMPOSIUM ON DATA, PRIVACY, AND E-COMMERCE, 2007, : 104 - +
  • [46] A grid-based clustering algorithm with referential value of parameters
    Yantao, Zhou
    Xingdong, Yi
    Zhengguo, Wu
    2007 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY, PROCEEDINGS, 2007, : 210 - 214
  • [47] An intelligent clustering algorithm for high-dimensional multiview data in big data applications
    Tao, Qian
    Gu, Chunqin
    Wang, Zhenyu
    Jiang, Daoning
    NEUROCOMPUTING, 2020, 393 : 234 - 244
  • [48] Accurate Grid-based Clustering Algorithm with Diagonal Grid Searching and Merging
    Liu, Feng
    Ye, Chengcheng
    Zhu, Erzhou
    2017 3RD INTERNATIONAL CONFERENCE ON APPLIED MATERIALS AND MANUFACTURING TECHNOLOGY (ICAMMT 2017), 2017, 242
  • [49] Clustering of High-Dimensional and Correlated Data
    McLachlan, Geoffrey J.
    Ng, Shu-Kay
    Wang, K.
    DATA ANALYSIS AND CLASSIFICATION, 2010, : 3 - 11
  • [50] Clustering in high-dimensional data spaces
    Murtagh, FD
    STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 279 - 292