A fuzzy C-means algorithm for optimizing data clustering

被引:46
|
作者
Hashemi, Seyed Emadedin [1 ]
Gholian-Jouybari, Fatemeh [1 ]
Hajiaghaei-Keshteli, Mostafa [1 ]
机构
[1] Tecnol Monterrey, Escuela Ingn & Ciencias, Puebla, Mexico
关键词
Whale optimization; FCM; Data clustering; Big Data; Fuzzy C-means clustering; INDEXES; SWARM; RAND;
D O I
10.1016/j.eswa.2023.120377
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data has increasingly become predominant in many research fields affecting human knowledge, including medicine and engineering. Cluster analysis, or clustering, is widely recognized as one of the most effective processes to deal with various types of data, especially big data. There has been considerable interest in Fuzzy CMeans (FCM) as a method for clustering data using a short-distance approach in data mining. However, despite its simplicity, this method is not suitable for clustering large data sets due to their complex structure. In particular, FCM is sensitive to cluster center initialization, and an improper initialization can result in slow or non-optimal convergence. In order to solve the FCM convergence problem and find more appropriate cluster centers, optimization methods are generally used. In this study, a whale optimization algorithm is applied to solve the problem. As a solution to the problem of big data clustering, random sampling, clustering on samples, and extending the clustering results to all data are proposed. The proposed algorithm is implemented on several large data sets, both artificial and real, with many features after normalization and standardization. To verify the validity and correctness of the performance of the proposed algorithm, the same data sets have been clustered with other known algorithms, and the results compared using several valid fuzzy indices. Based on the comparison results, it can be concluded that the proposed algorithm is more powerful and efficient than other algorithms and, hence, can be used to effectively cluster large data sets. Our study can benefit organizations and managers who have a large amount of data and are unable to classify or make use of them properly. Using big data takes a lot of time. The features of the proposed algorithm would be of great help to managers allowing them to make better decisions and improve the quality of their work.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Optimizing parameters of fuzzy c-means clustering algorithm
    Liu, Yongchao
    Zhang, Yunjie
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2007, : 633 - 638
  • [2] Optimizing of Fuzzy C-Means Clustering Algorithm Using GA
    Alata, Mohanad
    Molhim, Mohammad
    Ramini, Abdullah
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 29, 2008, 29 : 224 - 229
  • [3] A New Fuzzy c-Means Clustering Algorithm for Interval Data
    Jin, Yan
    Ma, Jianghong
    2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (ICCSAI 2013), 2013, : 156 - 159
  • [4] Fuzzy C-means clustering algorithm based on incomplete data
    Jia, Zhiping
    Yu, Zhiqiang
    Zhang, Chenghui
    2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION ACQUISITION, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2006, : 600 - 604
  • [5] A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data
    Li, Jinhua
    Song, Shiji
    Zhang, Yuli
    Li, Kang
    INTELLIGENT COMPUTING, NETWORKED CONTROL, AND THEIR ENGINEERING APPLICATIONS, PT II, 2017, 762 : 3 - 12
  • [6] A possibilistic fuzzy c-means clustering algorithm
    Pal, NR
    Pal, K
    Keller, JM
    Bezdek, JC
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2005, 13 (04) : 517 - 530
  • [7] An efficient Fuzzy C-Means clustering algorithm
    Hung, MC
    Yang, DL
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 225 - 232
  • [8] An Improved Fuzzy C-means Clustering Algorithm
    Duan, Lingzi
    Yu, Fusheng
    Zhan, Li
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1199 - 1204
  • [9] A novel fuzzy C-means clustering algorithm
    Li, Cuixia
    Yu, Jian
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2006, 4062 : 510 - 515
  • [10] The global Fuzzy C-Means clustering algorithm
    Wang, Weina
    Zhang, Yunjie
    Li, Yi
    Zhang, Xiaona
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 3604 - +