A New Under-Sampling Method to Face Class Overlap and Imbalance

被引:32
|
作者
Guzman-Ponce, Angelica [1 ]
Valdovinos, Rosa Maria [1 ]
Sanchez, Jose Salvador [2 ]
Marcial-Romero, Jose Raymundo [1 ]
机构
[1] Univ Autonoma Estado Mexico, Fac Ingn, Cerro Coatepec S-N,Ciudad Univ, Toluca 50100, Mexico
[2] Univ Jaume 1, Dept Comp Languages & Syst, Inst New Imaging Technol, Castellon de La Plana 12071, Spain
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 15期
关键词
class imbalance; class overlap; under-sampling; clustering; DBSCAN; minimum spanning tree; CLASSIFICATION; NETWORKS; IDENTIFICATION; PERFORMANCE; DATASETS; NOISY;
D O I
10.3390/app10155164
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios
    Alejo, R.
    Valdovinos, R. M.
    Garcia, V.
    Pacheco-Sanchez, J. H.
    PATTERN RECOGNITION LETTERS, 2013, 34 (04) : 380 - 388
  • [22] Alleviating Class Imbalance Issue in Software Fault Prediction Using DBSCAN-Based Induced Graph Under-Sampling Method
    Bhandari, Kirti
    Kumar, Kuldeep
    Sangal, Amrit Lal
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (09) : 12589 - 12627
  • [23] Adaptive K-means clustering based under-sampling methods to solve the class imbalance problem
    Zhou Q.
    Sun B.
    Data and Information Management, 2024, 8 (03)
  • [24] Entropy and improved k-nearest neighbor search based under-sampling (ENU) method to handle class overlap in imbalanced datasets
    Kumar, Anil
    Singh, Dinesh
    Yadav, Rama Shankar
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023,
  • [25] Entropy and improved k-nearest neighbor search based under-sampling (ENU) method to handle class overlap in imbalanced datasets
    Kumar, Anil
    Singh, Dinesh
    Yadav, Rama Shankar
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (02):
  • [26] An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md.
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [27] A Selective Under-Sampling (SUS) Method For Imbalanced Regression
    Aleksic, Jovana
    Garcia-Remesal, Miguel
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2025, 82 : 111 - 136
  • [28] QAM Signal Measurement Method based on Under-sampling
    Shimoda M.
    Otani A.
    IEEJ Transactions on Fundamentals and Materials, 2024, 144 (01) : 23 - 28
  • [29] Boosting the performance of over-sampling algorithms through under-sampling the minority class
    de Morais, Romero F. A. B.
    Vasconcelos, Germano C.
    NEUROCOMPUTING, 2019, 343 : 3 - 18
  • [30] Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset
    Yen, Show-Jane
    Lee, Yue-Shi
    INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 731 - 740