SS-DBSCAN: Semi-Supervised Density-Based Spatial Clustering of Applications With Noise for Meaningful Clustering in Diverse Density Data

被引:0
|
作者
Zaki Abdulhameed, Tiba [1 ]
Yousif, Suhad A. [1 ]
Samawi, Venus W. [2 ]
Imad Al-Shaikhli, Hasnaa [1 ]
机构
[1] Al Nahrain Univ, Coll Sci, Comp Sci Dept, Baghdad 64074, Iraq
[2] Isra Univ, Dept Smart Business, Amman 11622, Jordan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Clustering algorithms; Noise measurement; Measurement; Complexity theory; Classification algorithms; Wireless sensor networks; Semisupervised learning; Unsupervised learning; Text categorization; Clustering; DBSCAN; semi-supervised clustering; unsupervised classification; word classification; ALGORITHM;
D O I
10.1109/ACCESS.2024.3457587
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised clustering algorithm designed to identify clusters of various shapes and sizes in noisy datasets by pinpointing core points. The primary challenges associated with the DBSCAN algorithm involve the recognition of meaningful clusters within varying densities datasets and its sensitivity to parameter values of Epsilon distance and minimum number of neighbor points. These two issues may result in merging small clusters into larger clusters or splitting valid clusters into smaller clusters. A new Semi-Supervised DBSCAN (SS-DBSCAN) algorithm is introduced to improve the recognition of meaningful clusters. DBSCAN requires core points to be within, at most, Epsilon distance from their minimum neighboring points. The SS-DBSCAN algorithm, a modified version of the original DBSCAN, adds a pre-specified condition or constraint to identify core points further. This extra constraint is related to the clustering objective of a given dataset. To evaluate the effectiveness of SS-DBSCAN, we utilize three datasets: letter recognition, wireless localization, and Modern Standard Arabic (MSA) combined with Iraqi words language modeling. V-measure is used to evaluate the clustering efficiency for the letters recognition and wireless localization datasets. The perplexity (pp) of the class-based language model, built on the produced clusters, is the metric used for the Iraqi-MSA dataset clustering effectiveness. Experimental results showed the significant effectiveness of SS-DBSCAN. It outperforms DBSCAN when applied to letters and Iraqi-MSA datasets with improvements of 65% and 14.5%, respectively. A comparable performance was achieved when clustering the wireless localization dataset. Additionally, to assess the effectiveness of SS-DBSCAN, its performance has been compared to various modified versions of DBSCAN using four metrics: V-measure, PP, Adjusted Rand Index (ARI), and the Silhouette score. Based on these metrics, the results showed that SS-DBSCAN outperformed most DBSCAN versions in three case studies. Consequently, the proposed SS-DBSCAN algorithm is particularly suitable for high-density datasets. The SS-DBSCAN python code is available at https://github.com/TibaZaki/SS_DBSCAN.
引用
收藏
页码:131507 / 131520
页数:14
相关论文
共 50 条
  • [21] C-DBSCAN: Density-based clustering with constraints
    Ruiz, Carlos
    Spiliopoulou, Myra
    Menasalvas, Ernestina
    ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2007, 4482 : 216 - +
  • [22] Semi-Supervised Density Peaks Clustering Based on Constraint Projection
    Yan, Shan
    Wang, Hongjun
    Li, Tianrui
    Chu, Jielei
    Guo, Jin
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 140 - 147
  • [23] SDenPeak: Semi-Supervised Nonlinear Clustering based on Density and Distance
    Fan, Wen-Qi
    Wang, Chang-Dong
    Lai, Jian-Huang
    PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 269 - 275
  • [24] Clustering of Provinces in Indonesia Based on Regional Investment Capacity with Density-Based Spatial Clustering of Applications with Noise Method
    Nabarian, Tifanny
    Sutoto
    Gusmawati, Nerifa
    Sholehah, Danil Prastika Trimaratus
    Hidayanto, Achmad Nizar
    Sari, Annisa Monicha
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, ENGINEERING, AND DESIGN (ICCED), 2019,
  • [25] A new semi-supervised clustering algorithm for probability density functions and applications
    Nguyen-Trang, Thao
    Nguyen-Hoang, Yen
    Vo-Van, Tai
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (11): : 5965 - 5980
  • [26] A new semi-supervised clustering algorithm for probability density functions and applications
    Thao Nguyen-Trang
    Yen Nguyen-Hoang
    Tai Vo-Van
    Neural Computing and Applications, 2024, 36 : 5965 - 5980
  • [27] Classification of Subgroups of Solar and Heliospheric Observatory (SOHO) Sungrazing Kreutz Comet Group by the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) Clustering Algorithm
    Karimova, Ulkar
    Yi, Yu
    JOURNAL OF ASTRONOMY AND SPACE SCIENCES, 2024, 41 (01) : 35 - 42
  • [28] An adaptive density-based clustering algorithm for spatial database with noise
    Ma, DY
    Zhang, AD
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 467 - 470
  • [29] Density-Based Spatial Clustering of Applications With Noise (DBSCAN) for Probe Card Production for Advanced Quality Control of Wafer Probing Test
    Chien, Chen-Fu
    Suwattananuruk, Butsayarin
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2024, 37 (04) : 567 - 575
  • [30] Modification of a Density-Based Spatial Clustering Algorithm for Applications with Noise for Data Reduction in Intrusion Detection Systems
    Wiharto
    Wicaksana, Aditya K.
    Cahyani, Denis E.
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2021, 21 (02) : 189 - 203