SS-DBSCAN: Semi-Supervised Density-Based Spatial Clustering of Applications With Noise for Meaningful Clustering in Diverse Density Data

被引:0
|
作者
Zaki Abdulhameed, Tiba [1 ]
Yousif, Suhad A. [1 ]
Samawi, Venus W. [2 ]
Imad Al-Shaikhli, Hasnaa [1 ]
机构
[1] Al Nahrain Univ, Coll Sci, Comp Sci Dept, Baghdad 64074, Iraq
[2] Isra Univ, Dept Smart Business, Amman 11622, Jordan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Clustering algorithms; Noise measurement; Measurement; Complexity theory; Classification algorithms; Wireless sensor networks; Semisupervised learning; Unsupervised learning; Text categorization; Clustering; DBSCAN; semi-supervised clustering; unsupervised classification; word classification; ALGORITHM;
D O I
10.1109/ACCESS.2024.3457587
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised clustering algorithm designed to identify clusters of various shapes and sizes in noisy datasets by pinpointing core points. The primary challenges associated with the DBSCAN algorithm involve the recognition of meaningful clusters within varying densities datasets and its sensitivity to parameter values of Epsilon distance and minimum number of neighbor points. These two issues may result in merging small clusters into larger clusters or splitting valid clusters into smaller clusters. A new Semi-Supervised DBSCAN (SS-DBSCAN) algorithm is introduced to improve the recognition of meaningful clusters. DBSCAN requires core points to be within, at most, Epsilon distance from their minimum neighboring points. The SS-DBSCAN algorithm, a modified version of the original DBSCAN, adds a pre-specified condition or constraint to identify core points further. This extra constraint is related to the clustering objective of a given dataset. To evaluate the effectiveness of SS-DBSCAN, we utilize three datasets: letter recognition, wireless localization, and Modern Standard Arabic (MSA) combined with Iraqi words language modeling. V-measure is used to evaluate the clustering efficiency for the letters recognition and wireless localization datasets. The perplexity (pp) of the class-based language model, built on the produced clusters, is the metric used for the Iraqi-MSA dataset clustering effectiveness. Experimental results showed the significant effectiveness of SS-DBSCAN. It outperforms DBSCAN when applied to letters and Iraqi-MSA datasets with improvements of 65% and 14.5%, respectively. A comparable performance was achieved when clustering the wireless localization dataset. Additionally, to assess the effectiveness of SS-DBSCAN, its performance has been compared to various modified versions of DBSCAN using four metrics: V-measure, PP, Adjusted Rand Index (ARI), and the Silhouette score. Based on these metrics, the results showed that SS-DBSCAN outperformed most DBSCAN versions in three case studies. Consequently, the proposed SS-DBSCAN algorithm is particularly suitable for high-density datasets. The SS-DBSCAN python code is available at https://github.com/TibaZaki/SS_DBSCAN.
引用
收藏
页码:131507 / 131520
页数:14
相关论文
共 50 条
  • [1] Density-based semi-supervised clustering
    Carlos Ruiz
    Myra Spiliopoulou
    Ernestina Menasalvas
    Data Mining and Knowledge Discovery, 2010, 21 : 345 - 370
  • [2] Semi-Supervised Density-Based Clustering
    Lelis, Levi
    Sander, Joerg
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 842 - 847
  • [3] Density-based semi-supervised clustering
    Ruiz, Carlos
    Spiliopoulou, Myra
    Menasalvas, Ernestina
    DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (03) : 345 - 370
  • [4] ADAPTIVE DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE (DBSCAN) ACCORDING TO DATA
    Wang, Wei-Tung
    Wu, Yi-Leh
    Tang, Cheng-Yuan
    Hor, Maw-Kae
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL. 1, 2015, : 445 - 451
  • [5] Constrained Density-Based Spatial Clustering of Applications with Noise (DBSCAN) using hyperparameter optimization
    Kim, Jongwon
    Lee, Hyeseon
    Ko, Young Myoung
    KNOWLEDGE-BASED SYSTEMS, 2024, 303
  • [6] A Unified Framework of Density-Based Clustering for Semi-Supervised Classification
    Gertrudes, Jadson Castro
    Zimek, Arthur
    Sander, Jorg
    Campello, Ricardo J. G. B.
    30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
  • [7] A unified view of density-based methods for semi-supervised clustering and classification
    Gertrudes, Jadson Castro
    Zimek, Arthur
    Sander, Jorg
    Campello, Ricardo J. G. B.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (06) : 1894 - 1952
  • [8] A unified view of density-based methods for semi-supervised clustering and classification
    Jadson Castro Gertrudes
    Arthur Zimek
    Jörg Sander
    Ricardo J. G. B. Campello
    Data Mining and Knowledge Discovery, 2019, 33 : 1894 - 1952
  • [9] GRIDBSCAN: GRId density-based spatial clustering of applications with noise
    Uncu, Ozge
    Gruver, William A.
    Kotak, Dilip B.
    Sabaz, Dorian
    Alibhai, Zafeer
    Ng, Colin
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 2976 - +
  • [10] A new approach data processing: density-based spatial clustering of applications with noise (DBSCAN) clustering using game-theoryA new approach data processing: density-based spatial clustering of applications with...U. Kazemi, S. Soleimani
    Uranus Kazemi
    Seyfollah Soleimani
    Soft Computing, 2025, 29 (3) : 1331 - 1346