Data stream clustering: a review

被引:0
|
作者
Alaettin Zubaroğlu
Volkan Atalay
机构
[1] Middle East Technical University,Department of Computer Engineering
来源
关键词
Data streams; Data stream clustering; Real-time clustering;
D O I
暂无
中图分类号
学科分类号
摘要
Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-time data stream processing, because it can be applied with less prior information about the data and it does not need labeled instances. However, data stream clustering differs from traditional clustering in many aspects and it has several challenging issues. Here, we provide information regarding the concepts and common characteristics of data streams, such as concept drift, data structures for data streams, time window models and outlier detection. We comprehensively review recent data stream clustering algorithms and analyze them in terms of the base clustering technique, computational complexity and clustering accuracy. A comparison of these algorithms is given along with still open problems. We indicate popular data stream repositories and datasets, stream processing tools and platforms. Open problems about data stream clustering are also discussed.
引用
收藏
页码:1201 / 1236
页数:35
相关论文
共 50 条
  • [41] Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms
    Carnein, Matthias
    Trautmann, Heike
    BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2019, 61 (03) : 277 - 297
  • [42] Autonomous Data-driven Clustering for Live Data Stream
    Gu, Xiaowei
    Angelov, Plamen P.
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 1128 - 1135
  • [43] Evolving data stream clustering based on constant false clustering probability
    Kashani, Elham S.
    Shouraki, Saeed Bagheri
    Norouzi, Yaser
    INFORMATION SCIENCES, 2022, 614 : 1 - 18
  • [44] Data clustering: A review
    Jain, AK
    Murty, MN
    Flynn, PJ
    ACM COMPUTING SURVEYS, 1999, 31 (03) : 264 - 323
  • [45] Advances in Rough and Soft Clustering: Meta-Clustering, Dynamic Clustering, Data-Stream Clustering
    Lingras, Pawan
    Triff, Matt
    ROUGH SETS, (IJCRS 2016), 2016, 9920 : 3 - 22
  • [46] A Review on Data Stream Classification
    Haneen, A. A.
    Noraziah, A.
    Abd Wahab, Mohd Helmy
    1ST INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (ICOBIC) 2017, 2018, 1018
  • [47] Data stream classification: a review
    Kapil K. Wankhade
    Snehlata S. Dongre
    Kalpana C. Jondhale
    Iran Journal of Computer Science, 2020, 3 (4) : 239 - 260
  • [48] MuDi-Stream: A multi density clustering algorithm for evolving data stream
    Amini, Amineh
    Saboohi, Hadi
    Herawan, Tutut
    Teh Ying Wah
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 59 : 370 - 385
  • [49] Research on Parallelized Stream Data Micro Clustering Algorithm
    Ma, Ke
    Li, Lingjuan
    Ji, Yimu
    Luo, Shengmei
    Wen, Tao
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCES IN MECHANICAL ENGINEERING AND INDUSTRIAL INFORMATICS, 2015, 15 : 629 - 634
  • [50] Anomaly detection model based on data stream clustering
    Chunyong Yin
    Sun Zhang
    Zhichao Yin
    Jin Wang
    Cluster Computing, 2019, 22 : 1729 - 1738