GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

被引:9
|
作者
Mansoori, Eghbal G. [1 ]
机构
[1] Shiraz Univ, Sch Elect & Comp Engn, Shiraz, Iran
关键词
Grid-based clustering; Hierarchical clustering; Feature selection; High-dimensional data;
D O I
10.1007/s00500-013-1105-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a grid-based hierarchical clustering algorithm (GACH) as an efficient and robust method to explore clusters in high-dimensional data with no prior knowledge. It discovers the initial positions of the potential clusters automatically and then combines them hierarchically to obtain the final clusters. In this regard, GACH first projects the data patterns on a two-dimensional space (i.e., on a plane established by two features) to overcome the curse of dimensionality problem in high-dimensional data. To choose these two well-informed features, a simple and fast feature selection algorithm is proposed. Then, through meshing the plane with grid lines, GACH detects the crowded grid points. The nearest data patterns around these grid points are considered as initial members of some potential clusters. By returning the patterns back to their true dimensions, GACH refines these clusters. In the merging phase, GACH combines the closely adjacent clusters in a hierarchical bottom-up manner to construct the final clusters' members. The main features of GACH are: (1) it automatically discovers the clusters, (2) the obtained clusters are stable, (3) it is efficient for data sets with high dimensions, and (4) its merging process involves a threshold which can be obtained in advance for well-clustered data. To assess our proposed algorithm, it is applied on some benchmark data sets and the validity of obtained clusters is compared with the results of some other clustering algorithms. This comparison shows that GACH is accurate, efficient and feasible to discover clusters in high-dimensional data.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [1] GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
    Eghbal G. Mansoori
    Soft Computing, 2014, 18 : 905 - 922
  • [2] A grid-based clustering algorithm for high-dimensional data streams
    Lu, YS
    Sun, YF
    Xu, GP
    Liu, G
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 824 - 831
  • [3] A grid-based subspace clustering algorithm for high-dimensional data streams
    Sun, Yufen
    Lu, Yansheng
    WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 37 - 48
  • [4] High-Dimensional Grid-based Clustering for Multispectral Satellite Image Segmentation
    Rylov, Sergey
    2020 VI INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND NANOTECHNOLOGY (IEEE ITNT-2020), 2020,
  • [5] Clustering algorithm of high-dimensional data based on units
    School of In formation Engineering, Hubei Institute for Nationalities, Enshi 445000, China
    Jisuanji Yanjiu yu Fazhan, 2007, 9 (1618-1623): : 1618 - 1623
  • [6] Persistent homology based clustering algorithm for high-dimensional data
    Xiong Z.
    Wei Y.
    Xiong Z.
    He K.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (02): : 29 - 35
  • [7] Grid-based Hierarchical Spatial Clustering Algorithm in Presence of Obstacle and Constraints
    Yang, Yue
    Zhang, Jian-pei
    Yang, Jing
    ICICSE: 2008 INTERNATIONAL CONFERENCE ON INTERNET COMPUTING IN SCIENCE AND ENGINEERING, PROCEEDINGS, 2008, : 383 - 388
  • [8] An algorithm for high-dimensional traffic data clustering
    Zheng, Pengjun
    McDonald, Mike
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 59 - 68
  • [9] A density grid-based uncertain data stream clustering algorithm
    Zhao, J. (jintianzhao@yahoo.com), 1600, Binary Information Press (10):
  • [10] Grid-based indexing and search algorithms for large-scale and high-dimensional data
    Yang, Chuanfu
    Li, Zhiyang
    Qu, Wenyu
    Liu, Zhaobin
    Qi, Heng
    2017 14TH INTERNATIONAL SYMPOSIUM ON PERVASIVE SYSTEMS, ALGORITHMS AND NETWORKS & 2017 11TH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY & 2017 THIRD INTERNATIONAL SYMPOSIUM OF CREATIVE COMPUTING (ISPAN-FCST-ISCC), 2017, : 46 - 51