GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

被引:9
|
作者
Mansoori, Eghbal G. [1 ]
机构
[1] Shiraz Univ, Sch Elect & Comp Engn, Shiraz, Iran
关键词
Grid-based clustering; Hierarchical clustering; Feature selection; High-dimensional data;
D O I
10.1007/s00500-013-1105-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a grid-based hierarchical clustering algorithm (GACH) as an efficient and robust method to explore clusters in high-dimensional data with no prior knowledge. It discovers the initial positions of the potential clusters automatically and then combines them hierarchically to obtain the final clusters. In this regard, GACH first projects the data patterns on a two-dimensional space (i.e., on a plane established by two features) to overcome the curse of dimensionality problem in high-dimensional data. To choose these two well-informed features, a simple and fast feature selection algorithm is proposed. Then, through meshing the plane with grid lines, GACH detects the crowded grid points. The nearest data patterns around these grid points are considered as initial members of some potential clusters. By returning the patterns back to their true dimensions, GACH refines these clusters. In the merging phase, GACH combines the closely adjacent clusters in a hierarchical bottom-up manner to construct the final clusters' members. The main features of GACH are: (1) it automatically discovers the clusters, (2) the obtained clusters are stable, (3) it is efficient for data sets with high dimensions, and (4) its merging process involves a threshold which can be obtained in advance for well-clustered data. To assess our proposed algorithm, it is applied on some benchmark data sets and the validity of obtained clusters is compared with the results of some other clustering algorithms. This comparison shows that GACH is accurate, efficient and feasible to discover clusters in high-dimensional data.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [21] A fast consistent grid-based clustering algorithm
    Tarasenko, Anton S.
    Berikov, Vladimir B.
    Pestunov, Igor A.
    Rylov, Sergey A.
    Ruzankin, Pavel S.
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (04)
  • [22] A deflected grid-based algorithm for clustering analysis
    Department of Computer Science and Information Engineering, Tamkang University, 151 Ying-Chuan Road, Tamsui, Taipei County, Taiwan
    WSEAS Trans. Comput., 2008, 3 (125-132):
  • [23] A Hierarchical Model-based Approach to Co-Clustering High-Dimensional Data
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 886 - 890
  • [24] A density-based clustering algorithm for high-dimensional data with feature selection
    Qi Xianting
    Wang Pan
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 114 - 118
  • [25] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [26] Clustering High-Dimensional Data
    Masulli, Francesco
    Rovetta, Stefano
    CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
  • [27] Hierarchical Clustering of High-Dimensional Data Without Global Dimensionality Reduction
    Kampman, Ilari
    Elomaa, Tapio
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018), 2018, 11177 : 236 - 246
  • [28] A Clustering Algorithm for High-Dimensional Nonlinear Feature Data with Applications
    Jiang H.
    Wang G.
    Gao J.
    Gao Z.
    Gao R.
    Guo Q.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2017, 51 (12): : 49 - 55and90
  • [29] Non-parametric grid-based clustering algorithm for remote sensing data
    Pestunov, IA
    Sinyavsky, YN
    Proceedings of the Second IASTED International Multi-Conference on Automation, Control, and Information Technology - Signal and Image Processing, 2005, : 5 - 9
  • [30] A real-time grid-based clustering algorithm for large data set
    Yu, Zhiwen
    Wong, Hau-San
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 740 - +