K-modestream algorithm for clustering categorical data streams

被引:0
|
作者
Ravi Sankar Sangam
Hari Om
机构
[1] Indian Institute of Technology (Indian School of Mines),Department of Computer Science and Engineering
关键词
Data mining; Data streams; Clustering; K-modes;
D O I
10.1007/s40012-017-0170-z
中图分类号
学科分类号
摘要
Clustering categorical data streams is a challenging problem because new data points are continuously adding to the already existing database at rapid pace and there exists no natural order among the categorical values. Recently, some algorithms have been discussed to tackle the problem of clustering the categorical data streams. However, in all these schemes the user needs to pre-specify the number of clusters, which is not trivial, and it renders to inefficient in the data stream environment. In this paper, we propose a new clustering algorithm, named it as k-modestream, which follows the k-modes algorithm paradigm to dynamically cluster the categorical data streams. It automatically computes the number of clusters and their initial modes simultaneously at regular time intervals. We analyse the time complexity of our scheme and perform various experiments using the synthetic and real world datasets to evaluate its efficacy.
引用
收藏
页码:295 / 303
页数:8
相关论文
共 50 条
  • [21] A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
    Ji, Jinchao
    Pang, Wei
    Zhou, Chunguang
    Han, Xiao
    Wang, Zhe
    KNOWLEDGE-BASED SYSTEMS, 2012, 30 : 129 - 135
  • [22] An improved k-prototypes clustering algorithm for mixed numeric and categorical data
    Ji, Jinchao
    Bai, Tian
    Zhou, Chunguang
    Ma, Chao
    Wang, Zhe
    NEUROCOMPUTING, 2013, 120 : 590 - 596
  • [23] An Optimization Model for Clustering Categorical Data Streams with Drifting Concepts
    Bai, Liang
    Cheng, Xueqi
    Liang, Jiye
    Shen, Huawei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (11) : 2871 - 2883
  • [24] Kernel Subspace Clustering Algorithm for Categorical Data
    Xu K.-P.
    Chen L.-F.
    Sun H.-J.
    Wang B.-Z.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (11): : 3492 - 3505
  • [25] A hierarchical clustering algorithm for categorical sequence data
    Oh, SJ
    Kim, JY
    INFORMATION PROCESSING LETTERS, 2004, 91 (03) : 135 - 140
  • [26] Squeezer: An efficient algorithm for clustering categorical data
    He, ZY
    Xu, XF
    Deng, SC
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (05) : 611 - 624
  • [27] Coercion: A Distributed Clustering Algorithm for Categorical Data
    Wang, Bin
    Zhou, Yang
    Hei, Xinhong
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 683 - 687
  • [28] A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
    Ohn Mar San
    Van-Nam Huynh
    Yoshiteru Nakamori
    JournalofSystemsScienceandComplexity, 2003, (04) : 562 - 571
  • [29] A parallel clustering algorithm for categorical data set
    Wang, YX
    Wang, ZH
    Li, XM
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2004, 2004, 3070 : 928 - 933
  • [30] A subspace hierarchical clustering algorithm for categorical data
    Carbonera, Joel Luis
    Abel, Mara
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 509 - 516