Approximating Dunn's Cluster Validity Indices for Partitions of Big Data

被引:19
|
作者
Rathore, Punit [1 ]
Ghafoori, Zahra [2 ]
Bezdek, James C. [2 ]
Palaniswami, Marimuthu [1 ]
Leckie, Christopher [2 ]
机构
[1] Univ Melbourne, Dept Elect & Elect Engn, Parkville, Vic 3051, Australia
[2] Univ Melbourne, Sch Comp & Informat Syst, Parkville, Vic 3051, Australia
关键词
Approximate Dunn's indices; big data; boundary point estimation; data skeleton; Dunn's index (DI); internal cluster validity; Maximin sampling; VALIDATION; NUMBER;
D O I
10.1109/TCYB.2018.2806886
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dunn's internal cluster validity index is used to assess partition quality and subsequently identify a "best" crisp partition of n objects. Computing Dunn's index (DI) for partitions of n p-dimensional feature vector data has quadratic time complexity O(pn(2)), so its computation is impractical for very large values of n. This note presents six methods for approximating DI. Four methods are based on Maximin sampling, which identifies a skeleton of the full partition that contains some boundary points in each cluster. Two additional methods are presented that estimate boundary points associated with unsupervised training of one class support vector machines. Numerical examples compare approximations to DI based on all six methods. Four experiments on seven real and synthetic data sets support our assertion that computing approximations to DI with an incremental, neighborhood-based Maximin skeleton is both tractable and reliably accurate.
引用
收藏
页码:1629 / 1641
页数:13
相关论文
共 50 条
  • [41] Biological cluster validity indices based on the Gene Ontology
    Speer, N
    Spieth, C
    Zell, A
    ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, 2005, 3646 : 429 - 439
  • [42] Role of cluster validity indices in delineation of precipitation regions
    Bhatia N.
    Sojan J.M.
    Simonovic S.
    Srivastav R.
    Water (Switzerland), 2020, 12 (05):
  • [43] Validity Analysis of Network Big Data
    Wang, Peng
    Lv, Huaxia
    Zheng, Xiaojing
    Ma, Wenhui
    Wang, Weijin
    JOURNAL OF WEB ENGINEERING, 2023, 22 (03): : 465 - 496
  • [44] Generalized Possibilistic Fuzzy C-Means with novel cluster validity indices for clustering noisy data
    Askari, S.
    Montazerin, N.
    Zarandi, M. H. Fazel
    APPLIED SOFT COMPUTING, 2017, 53 : 262 - 283
  • [45] AutoClust: A Framework for Automated Clustering based on Cluster Validity Indices
    Poulakis, Yannis
    Doulkeridis, Christos
    Kyriazis, Dimosthenis
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 1220 - 1225
  • [46] Comparison and Weighted Summation Type of Fuzzy Cluster Validity Indices
    Zhou, K. L.
    Ding, S.
    Fu, C.
    Yang, S. L.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2014, 9 (03) : 370 - 378
  • [47] Use of line based symmetry for developing cluster validity indices
    Acharya, Sudipta
    Saha, Sriparna
    Bandyopadhyay, Sanghamitra
    SOFT COMPUTING, 2016, 20 (09) : 3461 - 3474
  • [48] Generalized Information Theoretic Cluster Validity Indices for Soft Clusterings
    Lei, Yang
    Bezdek, James C.
    Chan, Jeffrey
    Nguyen Xuan Vinh
    Romano, Simone
    Bailey, James
    2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), 2014, : 24 - 31
  • [49] Development of Some Line Symmetry Based Cluster Validity Indices
    Acharya, Sudipta
    Saha, Sriparna
    Bandyopadhyay, Sanghamitra
    2014 INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE ISCMI 2014, 2014, : 24 - 27
  • [50] Towards a standard methodology to evaluate internal cluster validity indices
    Gurrutxaga, Ibai
    Muguerza, Javier
    Arbelaitz, Olatz
    Perez, Jesus M.
    Martin, Jose I.
    PATTERN RECOGNITION LETTERS, 2011, 32 (03) : 505 - 515