A New Under-Sampling Method to Face Class Overlap and Imbalance

被引:32
|
作者
Guzman-Ponce, Angelica [1 ]
Valdovinos, Rosa Maria [1 ]
Sanchez, Jose Salvador [2 ]
Marcial-Romero, Jose Raymundo [1 ]
机构
[1] Univ Autonoma Estado Mexico, Fac Ingn, Cerro Coatepec S-N,Ciudad Univ, Toluca 50100, Mexico
[2] Univ Jaume 1, Dept Comp Languages & Syst, Inst New Imaging Technol, Castellon de La Plana 12071, Spain
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 15期
关键词
class imbalance; class overlap; under-sampling; clustering; DBSCAN; minimum spanning tree; CLASSIFICATION; NETWORKS; IDENTIFICATION; PERFORMANCE; DATASETS; NOISY;
D O I
10.3390/app10155164
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] A Novel Hybrid Sampling Method ESMOTE plus SSLM for Handling the Problem of Class Imbalance with Overlap in Financial Distress Detection
    Wang, Xiaomin
    Zhang, Rui
    Zhang, Zuoquan
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 3081 - 3105
  • [42] A Novel Hybrid Sampling Method ESMOTE+SSLM for Handling the Problem of Class Imbalance with Overlap in Financial Distress Detection
    Xiaomin Wang
    Rui Zhang
    Zuoquan Zhang
    Neural Processing Letters, 2023, 55 : 3081 - 3105
  • [43] A new digital backend in radio astronomy based on under-sampling technology
    Dong, Liang
    Wang, Min
    Bai, Zhengyao
    He, Lesheng
    FIFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2012): ALGORITHMS, PATTERN RECOGNITION AND BASIC TECHNOLOGIES, 2013, 8784
  • [44] Topographic Under-Sampling for Unbalanced Distributions
    Hamdi, Fatma
    Lebbah, Mustapha
    Bennani, Younes
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [45] Prediction of Autism-Related Genes Using a New Clustering-Based Under-Sampling Method
    Xuan Tho Dang
    Duong Hung Bui
    Thi Hong Nguyen
    Tran Quoc Vinh Nguyen
    Dang Hung Tran
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 209 - 214
  • [46] A Novel Evolutionary Preprocessing Method Based on Over-sampling and Under-sampling for Imbalanced Datasets
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    39TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2013), 2013, : 2354 - 2359
  • [47] Exploring class imbalance with under-sampling, over-sampling, and hybrid sampling based on Mahalanobis distance for landslide susceptibility assessment: a case study of the 2018 Iburi earthquake induced landslides in Hokkaido, Japan
    Nam, Kounghoon
    Kim, Jongtae
    Chae, Byung-Gon
    GEOSCIENCES JOURNAL, 2024, 28 (01) : 71 - 94
  • [48] A Frequency Recovering Method for Photonic Under-Sampling E-Field Measurement
    Yang, Yan
    Xie, Shuguo
    Dong, Yakai
    Wang, Tianheng
    Zhao, Xin
    IEEE SENSORS JOURNAL, 2021, 21 (12) : 13495 - 13505
  • [49] Demonstration of Measurement Method based on Under-sampling for mmWave Radio Communication Signals
    Shimoda, Masatsugu
    Uchiumi, Sou
    Kanno, Shouta
    Otani, Akihito
    2023 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, I2MTC, 2023,
  • [50] An Under-Sampling Method Based on Principal Component Analysis and Comprehensive Evaluation Model
    Fu Yangzhen
    Zhang Hong
    Bai Yaxin
    Sun Weixuan
    2016 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C 2016), 2016, : 414 - 415