A New Under-Sampling Method to Face Class Overlap and Imbalance

被引:32
|
作者
Guzman-Ponce, Angelica [1 ]
Valdovinos, Rosa Maria [1 ]
Sanchez, Jose Salvador [2 ]
Marcial-Romero, Jose Raymundo [1 ]
机构
[1] Univ Autonoma Estado Mexico, Fac Ingn, Cerro Coatepec S-N,Ciudad Univ, Toluca 50100, Mexico
[2] Univ Jaume 1, Dept Comp Languages & Syst, Inst New Imaging Technol, Castellon de La Plana 12071, Spain
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 15期
关键词
class imbalance; class overlap; under-sampling; clustering; DBSCAN; minimum spanning tree; CLASSIFICATION; NETWORKS; IDENTIFICATION; PERFORMANCE; DATASETS; NOISY;
D O I
10.3390/app10155164
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] RFCL: A new under-sampling method of reducing the degree of imbalance and overlap
    Rui Zhang
    Zuoquan Zhang
    Di Wang
    Pattern Analysis and Applications, 2021, 24 : 641 - 654
  • [2] RFCL: A new under-sampling method of reducing the degree of imbalance and overlap
    Zhang, Rui
    Zhang, Zuoquan
    Wang, Di
    PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (02) : 641 - 654
  • [3] A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
    Ping Gong
    Junguang Gao
    Li Wang
    Journal of Systems Science and Systems Engineering, 2022, 31 : 728 - 752
  • [4] A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
    Gong, Ping
    Gao, Junguang
    Wang, Li
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2022, 31 (06) : 728 - 752
  • [5] A majority affiliation based under-sampling method for class imbalance problem
    Xie, Ying
    Huang, Xian
    Qin, Feng
    Li, Fagen
    Ding, Xuyang
    INFORMATION SCIENCES, 2024, 662
  • [6] Addressing the class-imbalance and class-overlap problems by a metaheuristic-based under-sampling approach
    Soltanzadeh, Paria
    Feizi-Derakhshi, M. Reza
    Hashemzadeh, Mahdi
    PATTERN RECOGNITION, 2023, 143
  • [7] Exploratory under-sampling for class-imbalance learning
    Liu, Xu-Ying
    Wu, Jianxin
    Zhou, Zhi-Hua
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 965 - 969
  • [8] Handling class overlap and imbalance using overlap driven under-sampling with balanced random forest in software defect prediction
    Dar, Abdul Waheed
    Farooq, Sheikh Umar
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2024,
  • [9] DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem
    Guzman-Ponce, A.
    Sanchez, J. S.
    Valdovinos, R. M.
    Marcial-Romero, J. R.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168
  • [10] DBOS_US: a density-based graph under-sampling method to handle class imbalance and class overlap issues in software fault prediction
    Bhandari, Kirti
    Kumar, Kuldeep
    Sangal, Amrit Lal
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (15): : 22682 - 22725