A fuzzy C-means algorithm for optimizing data clustering

被引:46
|
作者
Hashemi, Seyed Emadedin [1 ]
Gholian-Jouybari, Fatemeh [1 ]
Hajiaghaei-Keshteli, Mostafa [1 ]
机构
[1] Tecnol Monterrey, Escuela Ingn & Ciencias, Puebla, Mexico
关键词
Whale optimization; FCM; Data clustering; Big Data; Fuzzy C-means clustering; INDEXES; SWARM; RAND;
D O I
10.1016/j.eswa.2023.120377
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data has increasingly become predominant in many research fields affecting human knowledge, including medicine and engineering. Cluster analysis, or clustering, is widely recognized as one of the most effective processes to deal with various types of data, especially big data. There has been considerable interest in Fuzzy CMeans (FCM) as a method for clustering data using a short-distance approach in data mining. However, despite its simplicity, this method is not suitable for clustering large data sets due to their complex structure. In particular, FCM is sensitive to cluster center initialization, and an improper initialization can result in slow or non-optimal convergence. In order to solve the FCM convergence problem and find more appropriate cluster centers, optimization methods are generally used. In this study, a whale optimization algorithm is applied to solve the problem. As a solution to the problem of big data clustering, random sampling, clustering on samples, and extending the clustering results to all data are proposed. The proposed algorithm is implemented on several large data sets, both artificial and real, with many features after normalization and standardization. To verify the validity and correctness of the performance of the proposed algorithm, the same data sets have been clustered with other known algorithms, and the results compared using several valid fuzzy indices. Based on the comparison results, it can be concluded that the proposed algorithm is more powerful and efficient than other algorithms and, hence, can be used to effectively cluster large data sets. Our study can benefit organizations and managers who have a large amount of data and are unable to classify or make use of them properly. Using big data takes a lot of time. The features of the proposed algorithm would be of great help to managers allowing them to make better decisions and improve the quality of their work.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Possibilistic Rough Fuzzy C-Means Algorithm in Data Clustering and Image Segmentation
    Tripathy, B. K.
    Tripathy, Anurag
    Rajulu, Kosireddy Govinda
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 981 - 986
  • [32] A Comparison of Validity Indices on Fuzzy C-Means Clustering Algorithm for Directional Data
    Kesemen, Orhan
    Tezel, Ozge
    Ozkul, Eda
    Tiryaki, Bugra Kaan
    Agayev, Elcin
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [33] An Adaptive Multiobjective Genetic Algorithm with Fuzzy c-Means for Automatic Data Clustering
    Dong, Ze
    Jia, Hao
    Liu, Miao
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [34] MODIFIED POSSIBILISTIC FUZZY C-MEANS ALGORITHM FOR CLUSTERING INCOMPLETE DATA SETS
    Rustam
    Usman, Koredianto
    Kamaruddin, Mudyawati
    Chamidah, Dina
    Nopendri
    Saleh, Khaerudin
    Eliskar, Yulinda
    Marzuki, Ismail
    ACTA POLYTECHNICA, 2021, 61 (02) : 364 - 377
  • [35] A weighted fuzzy c-means clustering model for fuzzy data
    D'Urso, P
    Giordani, P
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (06) : 1496 - 1523
  • [36] A Weighted Fuzzy c-Means Clustering Algorithm for Incomplete Big Sensor Data
    Li, Peng
    Chen, Zhikui
    Hu, Yueming
    Leng, Yonglin
    Li, Qiucen
    WIRELESS SENSOR NETWORKS (CWSN 2017), 2018, 812 : 55 - 63
  • [37] AN IMPROVED ALGORITHM FOR SUPERVISED FUZZY C-MEANS CLUSTERING OF REMOTELY SENSED DATA
    ZHANG Jingxiong Roger P Kirby
    Geo-Spatial Information Science, 2000, (01) : 39 - 44
  • [38] Locating clusters in noisy data: A genetic fuzzy c-means clustering algorithm
    Egan, MA
    1998 CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1998, : 178 - 182
  • [39] Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms
    Perez-Ortega, Joaquin
    Silvia Roblero-Aguilar, Sandra
    Nely Almanza-Ortega, Nelva
    Frausto Solis, Juan
    Zavala-Diaz, Crispin
    Hernandez, Yasmin
    Landero-Najera, Vanesa
    AXIOMS, 2022, 11 (08)
  • [40] A new fuzzy relational clustering algorithm based on the fuzzy C-means algorithm
    Corsini, P
    Lazzerini, B
    Marcelloni, F
    SOFT COMPUTING, 2005, 9 (06) : 439 - 447