SS-DBSCAN: Semi-Supervised Density-Based Spatial Clustering of Applications With Noise for Meaningful Clustering in Diverse Density Data

被引:0
|
作者
Zaki Abdulhameed, Tiba [1 ]
Yousif, Suhad A. [1 ]
Samawi, Venus W. [2 ]
Imad Al-Shaikhli, Hasnaa [1 ]
机构
[1] Al Nahrain Univ, Coll Sci, Comp Sci Dept, Baghdad 64074, Iraq
[2] Isra Univ, Dept Smart Business, Amman 11622, Jordan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Clustering algorithms; Noise measurement; Measurement; Complexity theory; Classification algorithms; Wireless sensor networks; Semisupervised learning; Unsupervised learning; Text categorization; Clustering; DBSCAN; semi-supervised clustering; unsupervised classification; word classification; ALGORITHM;
D O I
10.1109/ACCESS.2024.3457587
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised clustering algorithm designed to identify clusters of various shapes and sizes in noisy datasets by pinpointing core points. The primary challenges associated with the DBSCAN algorithm involve the recognition of meaningful clusters within varying densities datasets and its sensitivity to parameter values of Epsilon distance and minimum number of neighbor points. These two issues may result in merging small clusters into larger clusters or splitting valid clusters into smaller clusters. A new Semi-Supervised DBSCAN (SS-DBSCAN) algorithm is introduced to improve the recognition of meaningful clusters. DBSCAN requires core points to be within, at most, Epsilon distance from their minimum neighboring points. The SS-DBSCAN algorithm, a modified version of the original DBSCAN, adds a pre-specified condition or constraint to identify core points further. This extra constraint is related to the clustering objective of a given dataset. To evaluate the effectiveness of SS-DBSCAN, we utilize three datasets: letter recognition, wireless localization, and Modern Standard Arabic (MSA) combined with Iraqi words language modeling. V-measure is used to evaluate the clustering efficiency for the letters recognition and wireless localization datasets. The perplexity (pp) of the class-based language model, built on the produced clusters, is the metric used for the Iraqi-MSA dataset clustering effectiveness. Experimental results showed the significant effectiveness of SS-DBSCAN. It outperforms DBSCAN when applied to letters and Iraqi-MSA datasets with improvements of 65% and 14.5%, respectively. A comparable performance was achieved when clustering the wireless localization dataset. Additionally, to assess the effectiveness of SS-DBSCAN, its performance has been compared to various modified versions of DBSCAN using four metrics: V-measure, PP, Adjusted Rand Index (ARI), and the Silhouette score. Based on these metrics, the results showed that SS-DBSCAN outperformed most DBSCAN versions in three case studies. Consequently, the proposed SS-DBSCAN algorithm is particularly suitable for high-density datasets. The SS-DBSCAN python code is available at https://github.com/TibaZaki/SS_DBSCAN.
引用
收藏
页码:131507 / 131520
页数:14
相关论文
共 50 条
  • [41] A Multi Density-based Clustering Algorithm for Data Stream with Noise
    Amini, Amineh
    Saboohi, Hadi
    Teh, Ying Wah
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 1105 - 1112
  • [42] FEM-DBSCAN: An Efficient Density-Based Clustering Approach
    Kazemi, Uranus
    Boostani, Reza
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2021, 45 (03) : 979 - 992
  • [43] An Enhanced Density Based Spatial Clustering of Applications with Noise
    Ram, Anant
    Sharma, Ashish
    Jalal, Anand S.
    Singh, Raghuraj
    Agrawal, Ankur
    2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 1475 - +
  • [44] Regularized semi-supervised KLFDA algorithm based on density peak clustering
    Xinmin Tao
    Yixuan Bao
    Xiaohan Zhang
    Tian Liang
    Lin Qi
    Zhiting Fan
    Shan Huang
    Neural Computing and Applications, 2022, 34 : 19791 - 19817
  • [45] Regularized semi-supervised KLFDA algorithm based on density peak clustering
    Tao, Xinmin
    Bao, Yixuan
    Zhang, Xiaohan
    Liang, Tian
    Qi, Lin
    Fan, Zhiting
    Huang, Shan
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (22): : 19791 - 19817
  • [46] Adaptive Density-Based Spatial Clustering for Massive Data Analysis
    Cai, Zihao
    Wang, Jian
    He, Kejing
    IEEE ACCESS, 2020, 8 : 23346 - 23358
  • [47] Clustering spatial data in the presence of obstacles:: a density-based approach
    Zaïane, OR
    Lee, CH
    IDEAS 2002: INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2002, : 214 - 223
  • [48] A density-based spatial clustering for physical constraints
    Xin Wang
    Camilo Rostoker
    Howard J. Hamilton
    Journal of Intelligent Information Systems, 2012, 38 : 269 - 297
  • [49] Density-based spatial clustering in the presence of obstacles
    1600, Alexandria University, Alexandria, Egypt (44):
  • [50] Driver fixation region division-oriented clustering method based on the density-based spatial clustering of applications with noise and the mathematical morphology clustering
    Li, Shi-wu
    Xu, Yi
    Sun, Wen-cai
    Yang, Zhi-fa
    Wang, Lin-hong
    Chai, Meng
    Wei, Xue-xin
    ADVANCES IN MECHANICAL ENGINEERING, 2015, 7 (10): : 1 - 11