An Entropy-Based Clustering Algorithm for Real-Time High-Dimensional IoT Data Streams

被引:0
|
作者
Mutambik, Ibrahim [1 ]
机构
[1] King Saud Univ, Coll Humanities & Social Sci, Dept Informat Sci, POB 11451, Riyadh 4545, Saudi Arabia
关键词
Internet of Things (IoT); IoT data clustering; NSL-KDD dataset; memory consumption; sliding time window; COOPERATIVE NOMA; INFORMATION;
D O I
10.3390/s24227412
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The rapid growth of data streams, propelled by the proliferation of sensors and Internet of Things (IoT) devices, presents significant challenges for real-time clustering of high-dimensional data. Traditional clustering algorithms struggle with high dimensionality, memory and time constraints, and adapting to dynamically evolving data. Existing dimensionality reduction methods often neglect feature ranking, leading to suboptimal clustering performance. To address these issues, we introduce E-Stream, a novel entropy-based clustering algorithm for high-dimensional data streams. E-Stream performs real-time feature ranking based on entropy within a sliding time window to identify the most informative features, which are then utilized with the DenStream algorithm for efficient clustering. We evaluated E-Stream using the NSL-KDD dataset, comparing it against DenStream, CluStream, and MR-Stream. The evaluation metrics included the average F-Measure, Jaccard Index, Fowlkes-Mallows Index, Purity, and Rand Index. The results show that E-Stream outperformed the baseline algorithms in both clustering accuracy and computational efficiency while effectively reducing dimensionality. E-Stream also demonstrated significantly less memory consumption and fewer computational requirements, highlighting its suitability for real-time processing of high-dimensional data streams. Despite its strengths, E-Stream requires manual parameter adjustment and assumes a consistent number of active features, which may limit its adaptability to diverse datasets. Future work will focus on developing a fully autonomous, parameter-free version of the algorithm, incorporating mechanisms to handle missing features and improving the management of evolving clusters to enhance robustness and adaptability in dynamic IoT environments.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A grid-based clustering algorithm for high-dimensional data streams
    Lu, YS
    Sun, YF
    Xu, GP
    Liu, G
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 824 - 831
  • [2] A fast clustering method for real-time IoT data streams
    Sun, Jing
    Yao, Xin
    Journal of Computers (Taiwan), 2021, 32 (01) : 83 - 94
  • [3] A grid-based subspace clustering algorithm for high-dimensional data streams
    Sun, Yufen
    Lu, Yansheng
    WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 37 - 48
  • [4] Clustering algorithm of high-dimensional data based on units
    School of In formation Engineering, Hubei Institute for Nationalities, Enshi 445000, China
    Jisuanji Yanjiu yu Fazhan, 2007, 9 (1618-1623): : 1618 - 1623
  • [5] A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy
    Du, Xinzhi
    ENTROPY, 2023, 25 (03)
  • [6] Generalized projected clustering in high-dimensional data streams
    Wang, T
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 772 - 778
  • [7] An entropy-based subspace clustering algorithm for categorical data
    Carbonera, Joel Luis
    Abel, Mara
    2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 272 - 277
  • [8] Incremental entropy-based clustering on categorical data streams with concept drift
    Li, Yanhong
    Li, Deyu
    Wang, Suge
    Zhai, Yanhui
    KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 33 - 47
  • [9] Persistent homology based clustering algorithm for high-dimensional data
    Xiong Z.
    Wei Y.
    Xiong Z.
    He K.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (02): : 29 - 35
  • [10] An algorithm for high-dimensional traffic data clustering
    Zheng, Pengjun
    McDonald, Mike
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 59 - 68