A novel rough semi-supervised k-means algorithm for text clustering

被引:4
|
作者
Tang, Lei-yu [1 ]
Wang, Zhen-hao [1 ]
Wang, Shu-dong [2 ]
Fan, Jian-cong [1 ]
Yue, Guo-wei [3 ]
机构
[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China
[2] China Univ Petr, Coll Comp Sci & Technol, Qingdao, Peoples R China
[3] Shandong Univ Sci & Technol, Key Lab Min Disaster Prevent & Control, Qingdao 266590, Peoples R China
基金
中国国家自然科学基金;
关键词
rough set; approximation set; k-means algorithm; semi-supervised clustering; high dimensional sparse data; MODEL;
D O I
10.1504/IJBIC.2023.130548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since many attribute values of high-dimensional sparse data are zero, we combine the approximation set of the rough set theory with the semi-supervised k-means algorithm to propose a rough set-based semi-supervised k-means (RSKmeans) algorithm. Firstly, the proportion of non-zero values is calculated by a few labelled data samples, and a small number of important attributes in each cluster are selected to calculate the clustering centres. Secondly, the approximation set is used to calculate the information gain of each attribute. Thirdly, different attribute values are partitioned into the corresponding approximate sets according to the comparison of information gain with the upper approximation and boundary threshold. Then, the new attributes are increased and the above process is continued to update the clustering centres. The experimental results on text data show that the RSKmeans algorithm can help find the important attributes, filter the invalid information, and improve the performances significantly.
引用
收藏
页码:57 / 68
页数:13
相关论文
共 50 条
  • [31] Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data
    Jiang, Yu
    Yu, Dengwen
    Zhao, Mingzhao
    Bai, Hongtao
    Wang, Chong
    He, Lili
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 64 (01): : 207 - 216
  • [32] Chinese text clustering algorithm based k-means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 301 - 307
  • [33] Chinese Text Clustering Algorithm Based K-Means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 1, 2011, : 90 - 93
  • [34] Weighted k-Means Algorithm Based Text Clustering
    Chen, Xiuguo
    Yin, Wensheng
    Tu, Pinghui
    Zhang, Hengxi
    IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 51 - +
  • [35] Improved K-Means algorithm in text semantic clustering
    Ma, Junhong
    Open Cybernetics and Systemics Journal, 2014, 8 : 530 - 534
  • [36] Improved rough K-means clustering algorithm based on firefly algorithm
    Ye, Tingyu
    Ye, Jun
    Wang, Lei
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2023, 17 (01) : 1 - 12
  • [37] A Novel ELM K-Means Algorithm for Clustering
    Alshamiri, Abobakr Khalil
    Surampudi, Bapi Raju
    Singh, Alok
    SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, SEMCCO 2014, 2015, 8947 : 212 - 222
  • [38] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [39] A Novel Architecture for k-means Clustering Algorithm
    Khawaja, S. G.
    Khan, Asad Mansoor
    Akram, M. Usman
    Khan, Shoab A.
    PROCEEDINGS OF THE THIRD INTERNATIONAL AFRO-EUROPEAN CONFERENCE FOR INDUSTRIAL ADVANCEMENT-AECIA 2016, 2018, 565 : 311 - 320
  • [40] A novel method for K-means clustering algorithm
    Zhao, Jinguo, 1600, Transport and Telecommunication Institute, Lomonosova street 1, Riga, LV-1019, Latvia (18):