A New Under-Sampling Method to Face Class Overlap and Imbalance

被引:32
|
作者
Guzman-Ponce, Angelica [1 ]
Valdovinos, Rosa Maria [1 ]
Sanchez, Jose Salvador [2 ]
Marcial-Romero, Jose Raymundo [1 ]
机构
[1] Univ Autonoma Estado Mexico, Fac Ingn, Cerro Coatepec S-N,Ciudad Univ, Toluca 50100, Mexico
[2] Univ Jaume 1, Dept Comp Languages & Syst, Inst New Imaging Technol, Castellon de La Plana 12071, Spain
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 15期
关键词
class imbalance; class overlap; under-sampling; clustering; DBSCAN; minimum spanning tree; CLASSIFICATION; NETWORKS; IDENTIFICATION; PERFORMANCE; DATASETS; NOISY;
D O I
10.3390/app10155164
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data
    Popel, Mahmudul Hasan
    Hasib, Khan Md
    Habib, Syed Ahsan
    Shah, Faisal Muhammad
    2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
  • [32] A Method for Under-Sampling Modulation Pattern Recognition in Satellite Communication
    Wen, Tao
    Chen, Qi
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL II: SIGNAL PROCESSING, 2020, 516 : 932 - 944
  • [33] Under-sampling method based on sample weight for imbalanced data
    Xiong B.
    Wang G.
    Deng W.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2016, 53 (11): : 2613 - 2622
  • [34] Multilabel Over-sampling and Under-sampling with Class Alignment for Imbalanced Multilabel Text Classification
    Taha, Adil Yaseen
    Tiun, Sabrina
    Abd Rahman, Abdul Hadi
    Sabah, Ali
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2021, 20 (03): : 423 - 456
  • [35] A New Hybrid Under-sampling Approach to Imbalanced Classification Problems
    Peng, Chun-Yang
    Park, You-Jin
    APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [36] An empirical analysis of under-sampling techniques to balance a protein structural class dataset
    de Souto, Marcilio C. P.
    Bittencourt, Valnaide G.
    Costa, Jose A. F.
    NEURAL INFORMATION PROCESSING, PT 3, PROCEEDINGS, 2006, 4234 : 21 - 29
  • [37] Under-sampling class imbalanced datasets by combining clustering analysis and instance selection
    Tsai, Chih-Fong
    Lin, Wei-Chao
    Hu, Ya-Han
    Yao, Guan-Ting
    INFORMATION SCIENCES, 2019, 477 : 47 - 54
  • [38] A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data
    Guzman-Ponce, A.
    Valdovinos, R. M.
    Sanchez, J. S.
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 299 - 311
  • [39] An Under-sampling Method Based on Fuzzy Logic for Large Imbalanced Dataset
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 1248 - 1252
  • [40] AN IMBALANCED DATA CLASSIFICATION METHOD BASED ON AUTOMATIC CLUSTERING UNDER-SAMPLING
    Deng, Xiaoheng
    Zhong, Weijian
    Ren, Ju
    Zeng, Detian
    Zhang, Honggang
    2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,