An Entropy-Based Clustering Algorithm for Real-Time High-Dimensional IoT Data Streams

被引：0

作者：

Mutambik, Ibrahim ^{[1
]}

机构：

[1] King Saud Univ, Coll Humanities & Social Sci, Dept Informat Sci, POB 11451, Riyadh 4545, Saudi Arabia

来源：

SENSORS | 2024年 / 24卷 / 22期

关键词：

Internet of Things (IoT); IoT data clustering; NSL-KDD dataset; memory consumption; sliding time window; COOPERATIVE NOMA; INFORMATION;

D O I：

10.3390/s24227412

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

The rapid growth of data streams, propelled by the proliferation of sensors and Internet of Things (IoT) devices, presents significant challenges for real-time clustering of high-dimensional data. Traditional clustering algorithms struggle with high dimensionality, memory and time constraints, and adapting to dynamically evolving data. Existing dimensionality reduction methods often neglect feature ranking, leading to suboptimal clustering performance. To address these issues, we introduce E-Stream, a novel entropy-based clustering algorithm for high-dimensional data streams. E-Stream performs real-time feature ranking based on entropy within a sliding time window to identify the most informative features, which are then utilized with the DenStream algorithm for efficient clustering. We evaluated E-Stream using the NSL-KDD dataset, comparing it against DenStream, CluStream, and MR-Stream. The evaluation metrics included the average F-Measure, Jaccard Index, Fowlkes-Mallows Index, Purity, and Rand Index. The results show that E-Stream outperformed the baseline algorithms in both clustering accuracy and computational efficiency while effectively reducing dimensionality. E-Stream also demonstrated significantly less memory consumption and fewer computational requirements, highlighting its suitability for real-time processing of high-dimensional data streams. Despite its strengths, E-Stream requires manual parameter adjustment and assumes a consistent number of active features, which may limit its adaptability to diverse datasets. Future work will focus on developing a fully autonomous, parameter-free version of the algorithm, incorporating mechanisms to handle missing features and improving the management of evolving clusters to enhance robustness and adaptability in dynamic IoT environments.

引用

页数：16

共 50 条

[1] A grid-based clustering algorithm for high-dimensional data streams
Lu, YS
Sun, YF
Xu, GP
Liu, G
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 824 - 831
[2] A fast clustering method for real-time IoT data streams
Sun, Jing
Yao, Xin
Journal of Computers (Taiwan), 2021, 32 (01) : 83 - 94
[3] A grid-based subspace clustering algorithm for high-dimensional data streams
Sun, Yufen
Lu, Yansheng
WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 37 - 48
[4] Clustering algorithm of high-dimensional data based on units
School of In formation Engineering, Hubei Institute for Nationalities, Enshi 445000, China
Jisuanji Yanjiu yu Fazhan, 2007, 9 (1618-1623): : 1618 - 1623
[5] A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy
Du, Xinzhi
ENTROPY, 2023, 25 (03)
[6] Generalized projected clustering in high-dimensional data streams
Wang, T
FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 772 - 778
[7] An entropy-based subspace clustering algorithm for categorical data
Carbonera, Joel Luis
Abel, Mara
2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 272 - 277
[8] Incremental entropy-based clustering on categorical data streams with concept drift
Li, Yanhong
Li, Deyu
Wang, Suge
Zhai, Yanhui
KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 33 - 47
[9] Persistent homology based clustering algorithm for high-dimensional data
Xiong Z.
Wei Y.
Xiong Z.
He K.
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (02): : 29 - 35
[10] An algorithm for high-dimensional traffic data clustering
Zheng, Pengjun
McDonald, Mike
FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 59 - 68

← 1 2 3 4 5 →