Variable Weighting in Fuzzy k-Means Clustering to Determine the Number of Clusters

被引：38

作者：

Khan, Imran ^{[1
]}

Luo, Zongwei ^{[1
]}

Huang, Joshua Zhexue ^{[2
]}

Shahzad, Waseem ^{[3
]}

机构：

[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen Key Lab Computat Intelligence, Shenzhen 518055, Guangdong, Peoples R China

[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Guangdong, Peoples R China

[3] Natl Univ Comp & Emerging Sci, Dept Comp Sci, Islamabad 44000, Pakistan

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2020年 / 32卷 / 09期

关键词：

Fuzzy k-means; clustering; number of clusters; data mining; variable weighting; MEANS ALGORITHM; DATA SETS; SELECTION; CENTERS; MODEL;

D O I：

10.1109/TKDE.2019.2911582

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the most significant problems in cluster analysis is to determine the number of clusters in unlabeled data, which is the input for most clustering algorithms. Some methods have been developed to address this problem. However, little attention has been paid on algorithms that are insensitive to the initialization of cluster centers and utilize variable weights to recover the number of clusters. To fill this gap, we extend the standard fuzzy k-means clustering algorithm. It can automatically determine the number of clusters by iteratively calculating the weights of all variables and the membership value of each object in all clusters. Two new steps are added to the fuzzy k-means clustering process. One of them is to introduce a penalty term to make the clustering process insensitive to the initial cluster centers. The other one is to utilize a formula for iterative updating of variable weights in each cluster based on the current partition of data. Experimental results on real-world and synthetic datasets have shown that the proposed algorithm effectively determined the correct number of clusters while initializing the different number of cluster centroids. We also tested the proposed algorithm on gene data to determine a subset of important genes.

引用

页码：1838 / 1853

页数：16

共 50 条

[1] Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters
Li, Mark Junjie
Ng, Michael K.
Cheung, Yiu-ming
Huang, Joshua Zhexue
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1519 - 1534
[2] Automated variable weighting in k-means type clustering
Huang, JZX
Ng, MK
Rong, HQ
Li, ZC
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (05) : 657 - 668
[3] Choosing the Number of Clusters in K-Means Clustering
Steinley, Douglas
Brusco, Michael J.
PSYCHOLOGICAL METHODS, 2011, 16 (03) : 285 - 297
[4] Setting the number of clusters in K-means clustering
Huh, MH
RECENT ADVANCES IN STATISTICAL RESEARCH AND DATA ANALYSIS, 2002, : 115 - 124
[5] An iterative algorithm for optimal variable weighting in K-means clustering
Zhang, Shaonan
Li, Shanshan
Hu, Jiaqiao
Xing, Haipeng
Zhu, Wei
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2019, 48 (05) : 1346 - 1365
[6] Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering
Arima, Chinatsu
Hakamada, Kazumi
Okamoto, Masahiro
Hanai, Taizo
JOURNAL OF BIOSCIENCE AND BIOENGINEERING, 2008, 105 (03) : 273 - 281
[7] Feature weighting in k-means clustering
Modha, DS
Spangler, WS
MACHINE LEARNING, 2003, 52 (03) : 217 - 237
[8] Feature Weighting in k-Means Clustering
Dharmendra S. Modha
W. Scott Spangler
Machine Learning, 2003, 52 : 217 - 237
[9] Weighting variables in K-means clustering
Huh, Myung-Hoe
Lim, Yong B.
JOURNAL OF APPLIED STATISTICS, 2009, 36 (01) : 67 - 78
[10] Selection of Optimal Number of Clusters and Centroids for K-means and Fuzzy C-means Clustering: A Review
Pugazhenthi, A.
Kumar, Lakshmi Sutha
PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,

← 1 2 3 4 5 →