Approximating Dunn's Cluster Validity Indices for Partitions of Big Data

被引:19
|
作者
Rathore, Punit [1 ]
Ghafoori, Zahra [2 ]
Bezdek, James C. [2 ]
Palaniswami, Marimuthu [1 ]
Leckie, Christopher [2 ]
机构
[1] Univ Melbourne, Dept Elect & Elect Engn, Parkville, Vic 3051, Australia
[2] Univ Melbourne, Sch Comp & Informat Syst, Parkville, Vic 3051, Australia
关键词
Approximate Dunn's indices; big data; boundary point estimation; data skeleton; Dunn's index (DI); internal cluster validity; Maximin sampling; VALIDATION; NUMBER;
D O I
10.1109/TCYB.2018.2806886
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dunn's internal cluster validity index is used to assess partition quality and subsequently identify a "best" crisp partition of n objects. Computing Dunn's index (DI) for partitions of n p-dimensional feature vector data has quadratic time complexity O(pn(2)), so its computation is impractical for very large values of n. This note presents six methods for approximating DI. Four methods are based on Maximin sampling, which identifies a skeleton of the full partition that contains some boundary points in each cluster. Two additional methods are presented that estimate boundary points associated with unsupervised training of one class support vector machines. Numerical examples compare approximations to DI based on all six methods. Four experiments on seven real and synthetic data sets support our assertion that computing approximations to DI with an incremental, neighborhood-based Maximin skeleton is both tractable and reliably accurate.
引用
收藏
页码:1629 / 1641
页数:13
相关论文
共 50 条
  • [21] Cluster validity indices for graph partitioning
    Boutin, F
    Hascoët, M
    EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION, PROCEEDINGS, 2004, : 376 - 381
  • [22] New indices for cluster validity assessment
    Kim, M
    Ramakrishna, RS
    PATTERN RECOGNITION LETTERS, 2005, 26 (15) : 2353 - 2363
  • [23] Relational Generalizations of Cluster Validity Indices
    Sledge, Isaac J.
    Bezdek, James C.
    Havens, Timothy C.
    Keller, James M.
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (04) : 771 - 786
  • [24] Cluster validation indices for fMRI data: Fuzzy C-Means with feature partitions versus cluster merging strategies
    Alexiuk, MD
    Pizzi, NJ
    NAFIPS 2004: ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, VOLS 1AND 2: FUZZY SETS IN THE HEART OF THE CANADIAN ROCKIES, 2004, : 298 - 301
  • [25] A survey of cluster validity indices for automatic data clustering using differential evolution
    Jose-Garcia, Adan
    Gomez-Flores, Wilfrido
    PROCEEDINGS OF THE 2021 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'21), 2021, : 314 - 322
  • [26] A note on cluster validity indices SV and OS
    Chen, Guang Hui
    INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS II, PTS 1-3, 2013, 336-338 : 2199 - 2202
  • [27] An extensive comparative study of cluster validity indices
    Arbelaitz, Olatz
    Gurrutxaga, Ibai
    Muguerza, Javier
    Perez, Jesus M.
    Perona, Inigo
    PATTERN RECOGNITION, 2013, 46 (01) : 243 - 256
  • [28] Shape-invariant cluster validity indices
    Frederix, G
    Pauwels, EJ
    ADVANCES IN DATA MINING: APPLICATIONS IN IMAGE MINING, MEDICINE AND BIOTECHNOLOGY, MANAGEMENT AND ENVIRONMENTAL CONTROL, AND TELECOMMUNICATIONS, 2004, 3275 : 96 - 105
  • [29] Some connectivity based cluster validity indices
    Saha, Sriparna
    Bandyopadhyay, Sanghamitra
    APPLIED SOFT COMPUTING, 2012, 12 (05) : 1555 - 1565
  • [30] Experiences with Approximating Queries in Microsoft's Production Big-Data Clusters
    Kandula, Srikanth
    Lee, Kukjin
    Chaudhuri, Surajit
    Friedman, Marc
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2131 - 2142