K-modestream algorithm for clustering categorical data streams

被引:0
|
作者
Ravi Sankar Sangam
Hari Om
机构
[1] Indian Institute of Technology (Indian School of Mines),Department of Computer Science and Engineering
关键词
Data mining; Data streams; Clustering; K-modes;
D O I
10.1007/s40012-017-0170-z
中图分类号
学科分类号
摘要
Clustering categorical data streams is a challenging problem because new data points are continuously adding to the already existing database at rapid pace and there exists no natural order among the categorical values. Recently, some algorithms have been discussed to tackle the problem of clustering the categorical data streams. However, in all these schemes the user needs to pre-specify the number of clusters, which is not trivial, and it renders to inefficient in the data stream environment. In this paper, we propose a new clustering algorithm, named it as k-modestream, which follows the k-modes algorithm paradigm to dynamically cluster the categorical data streams. It automatically computes the number of clusters and their initial modes simultaneously at regular time intervals. We analyse the time complexity of our scheme and perform various experiments using the synthetic and real world datasets to evaluate its efficacy.
引用
收藏
页码:295 / 303
页数:8
相关论文
共 50 条
  • [1] SCLOPE: An algorithm for clustering data streams of categorical attributes
    Ong, KL
    Li, WY
    Ng, WK
    Lim, EP
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2004, 3181 : 209 - 218
  • [2] Clustering categorical data streams
    He, Zengyou
    Xu, Xiaofei
    Deng, Shengchun
    Huang, Joshua Zhexue
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2011, 11 (04) : 185 - 192
  • [3] A k-populations algorithm for clustering categorical data
    Kim, DW
    Lee, K
    Lee, D
    Lee, KH
    PATTERN RECOGNITION, 2005, 38 (07) : 1131 - 1134
  • [4] A fuzzy k-modes algorithm for clustering categorical data
    Huang, ZX
    Ng, MK
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (04) : 446 - 452
  • [5] K-distributions: A new algorithm for clustering categorical data
    Cai, Zhihua
    Wang, Dianhong
    Jiang, Liangxiao
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2007, 4682 : 436 - 443
  • [6] A Global K-modes Algorithm for Clustering Categorical Data
    Bai Tian
    Kulikowski, C. A.
    Gong Leiguang
    Yang Bin
    Huang Lan
    Zhou Chunguang
    CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (03): : 460 - 465
  • [7] A genetic k-modes algorithm for clustering categorical data
    Gan, GJ
    Yang, ZJ
    Wu, JH
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 195 - 202
  • [8] A modified K-means algorithm for categorical data clustering
    Sun, Y
    Zhu, QM
    Chen, ZX
    IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 31 - 37
  • [9] On clustering massive text and categorical data streams
    Aggarwal, Charu C.
    Yu, Philip S.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 24 (02) : 171 - 196
  • [10] On clustering massive text and categorical data streams
    Charu C. Aggarwal
    Philip S. Yu
    Knowledge and Information Systems, 2010, 24 : 171 - 196