GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

被引:9
|
作者
Mansoori, Eghbal G. [1 ]
机构
[1] Shiraz Univ, Sch Elect & Comp Engn, Shiraz, Iran
关键词
Grid-based clustering; Hierarchical clustering; Feature selection; High-dimensional data;
D O I
10.1007/s00500-013-1105-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a grid-based hierarchical clustering algorithm (GACH) as an efficient and robust method to explore clusters in high-dimensional data with no prior knowledge. It discovers the initial positions of the potential clusters automatically and then combines them hierarchically to obtain the final clusters. In this regard, GACH first projects the data patterns on a two-dimensional space (i.e., on a plane established by two features) to overcome the curse of dimensionality problem in high-dimensional data. To choose these two well-informed features, a simple and fast feature selection algorithm is proposed. Then, through meshing the plane with grid lines, GACH detects the crowded grid points. The nearest data patterns around these grid points are considered as initial members of some potential clusters. By returning the patterns back to their true dimensions, GACH refines these clusters. In the merging phase, GACH combines the closely adjacent clusters in a hierarchical bottom-up manner to construct the final clusters' members. The main features of GACH are: (1) it automatically discovers the clusters, (2) the obtained clusters are stable, (3) it is efficient for data sets with high dimensions, and (4) its merging process involves a threshold which can be obtained in advance for well-clustered data. To assess our proposed algorithm, it is applied on some benchmark data sets and the validity of obtained clusters is compared with the results of some other clustering algorithms. This comparison shows that GACH is accurate, efficient and feasible to discover clusters in high-dimensional data.
引用
收藏
页码:905 / 922
页数:18
相关论文
共 50 条
  • [31] Paralinear distance and its algorithm for hierarchical clustering of high-dimensional discrete variables
    Wang, Shuai
    Hao, Lizhu
    Wang, Xiaofei
    Guo, Jianhua
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2024, 167
  • [32] Model based clustering of high-dimensional binary data
    Tang, Yang
    Browne, Ryan P.
    Mc Nicholas, Paul D.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 87 : 84 - 101
  • [33] Extended grid-based clustering algorithm with referential parameters
    Zhou, Yan-Tao
    Wu, Zheng-Guo
    Yi, Xing-Dong
    Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2009, 36 (02): : 48 - 52
  • [34] Subspace clustering over high-dimensional data stream based on grid density and attribute relativity
    College of Information Science and Engineering, Yanshan University, Qinhuangdao City, 066004, China
    不详
    Adv. Inf. Sci. Serv. Sci., 17 (91-99):
  • [35] HSCFC: High-dimensional streaming data clustering algorithm based on feedback control system
    Ding, Guohui
    Wang, Yankai
    Li, Chenyang
    Sun, Haohan
    Li, Cailong
    Wang, Lei
    Yin, Haijun
    Huang, Tiantian
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 146 : 156 - 165
  • [36] A Valid Clustering Algorithm for High-dimensional Large Data Sets Based on Distributed Method
    Guo Xian e
    Yan Junmei
    PROCEEDINGS OF 2009 INTERNATIONAL WORKSHOP ON INFORMATION SECURITY AND APPLICATION, 2009, : 1 - 6
  • [37] A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Song, Qinbao
    Ni, Jingjie
    Wang, Guangtao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 1 - 14
  • [38] A grid-based clustering algorithm for wild bird distribution
    Wang, Yuwei
    Zhou, Yuanchun
    Liu, Ying
    Luo, Ze
    Guo, Danhuai
    Shao, Jing
    Tan, Fei
    Wu, Liang
    Li, Jianhui
    Yan, Baoping
    FRONTIERS OF COMPUTER SCIENCE, 2013, 7 (04) : 475 - 485
  • [39] A Clustering Algorithm of High-Dimensional Data Based on Sequential Psim Matrix and Differential Truncation
    Wang, Gongming
    Li, Wenfa
    Xu, Weizhi
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT II, 2018, 11335 : 297 - 307
  • [40] A grid-based clustering algorithm for wild bird distribution
    Yuwei WANG
    Yuanchun ZHOU
    Ying LIU
    Ze LUO
    Danhuai GUO
    Jing SHAO
    Fei TAN
    Liang WU
    Jianhui LI
    Baoping YAN
    Frontiers of Computer Science, 2013, 7 (04) : 475 - 485