Data Stream Clustering: Introducing Recursively Extendable Aggregation Functions for Incremental Cluster Fusion Processes

被引:0
|
作者
Urio-Larrea, A. [1 ]
Camargo, H. [2 ]
Lucca, G. [3 ]
Asmus, T. [4 ,5 ]
Marco-Detchart, C. [1 ]
Schick, L. [2 ]
Lopez-Molina, C. [1 ]
Andreu-Perez, J. [6 ]
Bustince, H. [1 ]
Dimuro, G. P. [4 ,5 ]
机构
[1] Univ Publ Navarra, Dept Estadist, Pamplona 31006, Spain
[2] Univ Fed Sao Carlos, Dept Computac, BR-13565905 Sao Carlos, Brazil
[3] Univ Catolica Pelotas, Ctr Ciencias Sociais & Tecnol, BR-96015560 Pelotas, Brazil
[4] Univ Fed Rio Grande, Inst Matemat Estat & Fisisca, BR-96203900 Rio Grande, Brazil
[5] Univ Fed Rio Grande, Ctr Ciencias Computacionais, BR-96203900 Rio Grande, Brazil
[6] Univ Essex, Sch Comp Sci & Elect Engn, Colchester, England
基金
巴西圣保罗研究基金会;
关键词
Aggregation functions; data streams (DSs); fuzzy clustering; overlap indices; similarity measures;
D O I
10.1109/TCYB.2025.3527862
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In data stream (DS) learning, the system has to extract knowledge from data generated continuously, usually at high speed and in large volumes, making it impossible to store the entire set of data to be processed in batch mode. Hence, machine learning models must be built incrementally by processing the incoming examples, as data arrive, while updating the model to be compatible with the current data. In fuzzy DS clustering, the model can either absorb incoming data into existing clusters or initiate a new cluster. As the volume of data increases, there is a possibility that the clusters will overlap to the point where it is convenient to merge two or more clusters into one. Then, a cluster comparison measure (CM) should be applied, to decide whether such clusters should be combined, also in an incremental manner. This defines an incremental fusion process based on aggregation functions that can aggregate the incoming inputs without storing all the previous inputs. The objective of this article is to solve the fuzzy DS clustering problem of incrementally comparing fuzzy clusters on a formal basis. First, we formalize and operationalize incremental fusion processes of fuzzy clusters by introducing recursively extendable (RE) aggregation functions, studying construction methods and different classes of such functions. Second, we propose two approaches to compare clusters: 1) similarity and 2) overlapping between clusters, based on RE aggregation functions. Finally, we analyze the effect of those incremental CMs on the online and offline phases of the well-known fuzzy clustering algorithm d-FuzzStream, showing that our new approach outperforms the original algorithm and presents better or comparable performance to other state-of-the-art DS clustering algorithms found in the literature.
引用
收藏
页码:1421 / 1435
页数:15
相关论文
共 40 条
  • [21] Cluster-computer based incremental and distributed RSOM data-clustering
    Xia, Sheng-Ping
    Liu, Jian-Jun
    Yuan, Zhen-Tao
    Yu, Hua
    Zhang, Le-Feng
    Yu, Wen-Xian
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2007, 35 (03): : 385 - 391
  • [22] Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering
    Song, MZ
    Wang, HB
    INTELLIGENT COMPUTING: THEORY AND APPLICATIONS III, 2005, 5803 : 174 - 183
  • [23] Research on LEACH Algorithm Based on Double Cluster Head Cluster Clustering and Data Fusion
    Wang, Hongjun
    Chang, Huiqing
    Zhao, Hui
    Yue, Youjun
    2017 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (ICMA), 2017, : 342 - 346
  • [24] Multi-Sensor Data Fusion for Cluster-based Data Aggregation in IoT Applications
    Redhu, Surender
    Hegde, Rajesh M.
    13TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED NETWORKS AND TELECOMMUNICATION SYSTEMS (IEEE ANTS), 2019,
  • [25] A dynamic hierarchical incremental learning-based supervised clustering for data stream with considering concept drift
    Nikpour, Soheila
    Asadi, Shahrokh
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 13 (6) : 2983 - 3003
  • [26] A dynamic hierarchical incremental learning-based supervised clustering for data stream with considering concept drift
    Soheila Nikpour
    Shahrokh Asadi
    Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 2983 - 3003
  • [27] CC_TRS: Continuous Clustering of Trajectory Stream Data Based on Micro Cluster Life
    Riyadh, Musaab
    Mustapha, Norwati
    Sulaiman, Md. Nasir
    Sharef, Nurfadhlina Binti Mohd
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2017, 2017
  • [28] An efficient incremental clustering based improved K-Medoids for IoT multivariate data cluster analysis
    Balakrishna, Sivadi
    Thirumaran, M.
    Padmanaban, R.
    Solanki, Vijender Kumar
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2020, 13 (04) : 1152 - 1175
  • [29] An efficient incremental clustering based improved K-Medoids for IoT multivariate data cluster analysis
    Sivadi Balakrishna
    M. Thirumaran
    R. Padmanaban
    Vijender Kumar Solanki
    Peer-to-Peer Networking and Applications, 2020, 13 : 1152 - 1175
  • [30] A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data
    Chen, Jin-Yin
    He, Hui-Hao
    INFORMATION SCIENCES, 2016, 345 : 271 - 293