A novel rough semi-supervised k-means algorithm for text clustering

被引:4
|
作者
Tang, Lei-yu [1 ]
Wang, Zhen-hao [1 ]
Wang, Shu-dong [2 ]
Fan, Jian-cong [1 ]
Yue, Guo-wei [3 ]
机构
[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China
[2] China Univ Petr, Coll Comp Sci & Technol, Qingdao, Peoples R China
[3] Shandong Univ Sci & Technol, Key Lab Min Disaster Prevent & Control, Qingdao 266590, Peoples R China
基金
中国国家自然科学基金;
关键词
rough set; approximation set; k-means algorithm; semi-supervised clustering; high dimensional sparse data; MODEL;
D O I
10.1504/IJBIC.2023.130548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since many attribute values of high-dimensional sparse data are zero, we combine the approximation set of the rough set theory with the semi-supervised k-means algorithm to propose a rough set-based semi-supervised k-means (RSKmeans) algorithm. Firstly, the proportion of non-zero values is calculated by a few labelled data samples, and a small number of important attributes in each cluster are selected to calculate the clustering centres. Secondly, the approximation set is used to calculate the information gain of each attribute. Thirdly, different attribute values are partitioned into the corresponding approximate sets according to the comparison of information gain with the upper approximation and boundary threshold. Then, the new attributes are increased and the above process is continued to update the clustering centres. The experimental results on text data show that the RSKmeans algorithm can help find the important attributes, filter the invalid information, and improve the performances significantly.
引用
收藏
页码:57 / 68
页数:13
相关论文
共 50 条
  • [41] Semi-Supervised K-Means DDoS Detection Method Using Hybrid Feature Selection Algorithm
    Gu, Yonghao
    Li, Kaiyue
    Guo, Zhenyang
    Wang, Yongfei
    IEEE ACCESS, 2019, 7 : 64351 - 64365
  • [42] Performance-enhanced rough k-means clustering algorithm
    Sivaguru, M.
    Punniyamoorthy, M.
    SOFT COMPUTING, 2021, 25 (02) : 1595 - 1616
  • [43] Effects of Semi-supervised Learning on Rough Membership C-Means Clustering
    Shimizu, Takeaki
    Ubukata, Seiki
    Notsu, Akira
    Honda, Katsuhiro
    2019 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2019, : 15 - 20
  • [44] An Adaptive Robust Semi-Supervised Clustering Framework Using Weighted Consensus of Random k-Means Ensemble
    Lai, Yongxuan
    He, Songyao
    Lin, Zhijie
    Yang, Fan
    Zhou, Qifeng
    Zhou, Xiaofang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (05) : 1877 - 1890
  • [45] RETRACTED: Research on semi supervised K-means clustering algorithm in data mining (Retracted Article)
    Mai, Xiaodong
    Cheng, Jiangke
    Wang, Shengnan
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : S3513 - S3520
  • [46] Outliers in rough k-means clustering
    Peters, G
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 702 - 707
  • [47] The Rough Membership k-Means Clustering
    Ubukata, Seiki
    Notsu, Akira
    Honda, Katsuhiro
    INTEGRATED UNCERTAINTY IN KNOWLEDGE MODELLING AND DECISION MAKING, IUKM 2016, 2016, 9978 : 207 - 216
  • [48] The Rough Set k-Means Clustering
    Ubukata, Seiki
    Notsu, Akira
    Honda, Katsuhiro
    2016 JOINT 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 17TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2016, : 189 - 193
  • [49] Evolutionary Rough K-Means Clustering
    Lingras, Pawan
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2009, 5589 : 68 - 75
  • [50] Adapting k-means for supervised clustering
    S. H. Al-Harbi
    V. J. Rayward-Smith
    Applied Intelligence, 2006, 24 : 219 - 226