Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning

被引:0
|
作者
Martin Nilsson
机构
[1] Los Alamos National Laboratory,
来源
Information Retrieval | 2002年 / 5卷
关键词
clustering; taxonomy; PCA; classification;
D O I
暂无
中图分类号
学科分类号
摘要
We present a non-greedy version of the recently published Principal Direction Divisive Partitioning (PDDP) algorithm. The PDDP algorithm creates a hierarchical taxonomy of a data set by successively splitting the data into sub-clusters. At each level the cluster with largest variance is split by a hyper-plane orthogonal to its leading principal component. The PDDP algorithm is known to produce high quality clusters, especially when applied to high dimensional data, such as document-word feature matrices. It also scales well with both the size and the dimensionality of the data set. However, at each level only the locally optimal choice of spitting is considered. At a later stage this often leads to a non-optimal global partitioning of the data. The non-greedy version of the PDDP algorithm (NGPDDP) presented in this paper address this problem. At each level multiple alternative splitting strategies are considered. Results from applying the algorithm to generated and real data (feature vectors from sets of text documents) are presented. The results show substantial improvements in the cluster quality.
引用
收藏
页码:311 / 321
页数:10
相关论文
共 50 条
  • [31] Regionalization of Precipitation Regimes in Iran Using Principal Component Analysis and Hierarchical Clustering Analysis
    Mohammad Darand
    Mohammad Reza Mansouri Daneshvar
    Environmental Processes, 2014, 1 (4) : 517 - 532
  • [32] Dynamic mining hierarchical topic from web news stream data using divisive-agglomerative clustering method
    Liu, JW
    Yu, SJ
    Le, JJ
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 826 - 831
  • [33] MAPPING INDONESIAN POTENTIAL FISHING ZONE USING HIERARCHICAL AND NON-HIERARCHICAL CLUSTERING
    Pontoh, Resa Septiani
    Mulyani, Soffy
    Zhahira, Salma
    Wiratama, Octavia Aulia
    Farras, Mohamad Naufal
    Arisanti, Restu
    COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2023,
  • [34] MPM: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data (vol 26, pg 185, 2006)
    Jiau, Hewijin Christine
    Su, Yi-Jen
    Lin, Yeou-Min
    Tsai, Shang-Rong
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2006, 26 (03) : 303 - 303
  • [35] Optimal temporal distribution curves for the classification of heavy precipitation using hierarchical clustering on principal components
    Vantas, K.
    Sidiropoulos, E.
    Vafeiadis, M.
    GLOBAL NEST JOURNAL, 2019, 21 (04): : 530 - 538
  • [36] Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms
    Fernandez, Alberto
    Gomez, Sergio
    JOURNAL OF CLASSIFICATION, 2008, 25 (01) : 43 - 65
  • [37] Solving Non-Uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms
    Alberto Fernández
    Sergio Gómez
    Journal of Classification, 2008, 25 : 43 - 65
  • [38] Drivers of carbon dioxide emissions: an empirical investigation using hierarchical and non-hierarchical clustering methods
    Inekwe, John
    Maharaj, Elizabeth Ann
    Bhattacharya, Mita
    ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2020, 27 (01) : 1 - 40
  • [39] Development of the WEEE grouping system in South Korea using the hierarchical and non-hierarchical clustering algorithms
    Park, Jihwan
    Park, Keon Vin
    Yoo, Soohyun
    Choi, Sang Ok
    Han, Sung Won
    RESOURCES CONSERVATION AND RECYCLING, 2020, 161
  • [40] Drivers of carbon dioxide emissions: an empirical investigation using hierarchical and non-hierarchical clustering methods
    John Inekwe
    Elizabeth Ann Maharaj
    Mita Bhattacharya
    Environmental and Ecological Statistics, 2020, 27 : 1 - 40