SS-DBSCAN: Semi-Supervised Density-Based Spatial Clustering of Applications With Noise for Meaningful Clustering in Diverse Density Data

被引:0
|
作者
Zaki Abdulhameed, Tiba [1 ]
Yousif, Suhad A. [1 ]
Samawi, Venus W. [2 ]
Imad Al-Shaikhli, Hasnaa [1 ]
机构
[1] Al Nahrain Univ, Coll Sci, Comp Sci Dept, Baghdad 64074, Iraq
[2] Isra Univ, Dept Smart Business, Amman 11622, Jordan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Clustering algorithms; Noise measurement; Measurement; Complexity theory; Classification algorithms; Wireless sensor networks; Semisupervised learning; Unsupervised learning; Text categorization; Clustering; DBSCAN; semi-supervised clustering; unsupervised classification; word classification; ALGORITHM;
D O I
10.1109/ACCESS.2024.3457587
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised clustering algorithm designed to identify clusters of various shapes and sizes in noisy datasets by pinpointing core points. The primary challenges associated with the DBSCAN algorithm involve the recognition of meaningful clusters within varying densities datasets and its sensitivity to parameter values of Epsilon distance and minimum number of neighbor points. These two issues may result in merging small clusters into larger clusters or splitting valid clusters into smaller clusters. A new Semi-Supervised DBSCAN (SS-DBSCAN) algorithm is introduced to improve the recognition of meaningful clusters. DBSCAN requires core points to be within, at most, Epsilon distance from their minimum neighboring points. The SS-DBSCAN algorithm, a modified version of the original DBSCAN, adds a pre-specified condition or constraint to identify core points further. This extra constraint is related to the clustering objective of a given dataset. To evaluate the effectiveness of SS-DBSCAN, we utilize three datasets: letter recognition, wireless localization, and Modern Standard Arabic (MSA) combined with Iraqi words language modeling. V-measure is used to evaluate the clustering efficiency for the letters recognition and wireless localization datasets. The perplexity (pp) of the class-based language model, built on the produced clusters, is the metric used for the Iraqi-MSA dataset clustering effectiveness. Experimental results showed the significant effectiveness of SS-DBSCAN. It outperforms DBSCAN when applied to letters and Iraqi-MSA datasets with improvements of 65% and 14.5%, respectively. A comparable performance was achieved when clustering the wireless localization dataset. Additionally, to assess the effectiveness of SS-DBSCAN, its performance has been compared to various modified versions of DBSCAN using four metrics: V-measure, PP, Adjusted Rand Index (ARI), and the Silhouette score. Based on these metrics, the results showed that SS-DBSCAN outperformed most DBSCAN versions in three case studies. Consequently, the proposed SS-DBSCAN algorithm is particularly suitable for high-density datasets. The SS-DBSCAN python code is available at https://github.com/TibaZaki/SS_DBSCAN.
引用
收藏
页码:131507 / 131520
页数:14
相关论文
共 50 条
  • [31] Vectorized Highly Parallel Density-Based Clustering for Applications With Noise
    Xavier, Joseph Arnold
    Muriedas, Juan Pedro Gutierrez Hermosillo
    Nassyr, Stepan
    Sedona, Rocco
    Goetz, Markus
    Streit, Achim
    Riedel, Morris
    Cavallaro, Gabriele
    IEEE ACCESS, 2024, 12 : 181679 - 181692
  • [32] MDST-DBSCAN: A Density-Based Clustering Method for Multidimensional Spatiotemporal Data
    Choi, Changlock
    Hong, Seong-Yun
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (06)
  • [33] Density-sensitive semi-supervised spectral clustering
    Wang, Ling
    Bo, Lie-Feng
    Jiao, Li-Cheng
    Ruan Jian Xue Bao/Journal of Software, 2007, 18 (10): : 2412 - 2422
  • [34] Restricted Airspace Unit Identification Using Density-Based Spatial Clustering of Applications with Noise
    Tian, Yong
    Ye, Bojia
    Wan, Lili
    Yang, Minhao
    Xing, Dawei
    SUSTAINABILITY, 2019, 11 (21)
  • [35] RAPID CLUSTERING WITH SEMI-SUPERVISED ENSEMBLE DENSITY CENTERS
    Kadhim, Mustafa R.
    Tian, Wenhong
    Khan, Tahseen
    2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 230 - 235
  • [36] Adaptive Density-Based Spatial Clustering of Applications with Noise (ADBSCAN) for Clusters of Different Densities
    Fahim, Ahmed
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3695 - 3712
  • [37] Improved Density Based Spatial Clustering of Applications of Noise Clustering Algorithm for Knowledge Discovery in Spatial Data
    Sharma, Arvind
    Gupta, R. K.
    Tiwari, Akhilesh
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [38] Significant DBSCAN plus : Statistically Robust Density-based Clustering
    Xie, Yiqun
    Jia, Xiaowei
    Shekhar, Shashi
    Bao, Han
    Zhou, Xun
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (05)
  • [39] FEM-DBSCAN: An Efficient Density-Based Clustering Approach
    Uranus Kazemi
    Reza Boostani
    Iranian Journal of Science and Technology, Transactions of Electrical Engineering, 2021, 45 : 979 - 992
  • [40] Density-Based Clustering over an Evolving Data Stream with Noise
    Cao, Feng
    Ester, Martin
    Qian, Weining
    Zhou, Aoying
    PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 328 - +