Comparison of resampling methods for dealing with imbalanced data in binary classification problem

被引:2
|
作者
Park, Geun U. [1 ]
Jun, Inkyun G. [1 ]
机构
[1] Yonsei Univ, Div Biostat, Dept Biomed Syst Informat, Coll Med, 50-1 Yonsei Ro, Seoul 03722, South Korea
关键词
imbalanced-learn; imbalanced binary data; under-sampling; over-sampling; NEIGHBOR; SMOTE;
D O I
10.5351/KJAS.2019.32.3.349
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A class imbalance problem arises when one class outnumbers the other class by a large proportion in binary data. Studies such as transforming the learning data have been conducted to solve this imbalance problem. In this study, we compared resampling methods among methods to deal with an imbalance in the classification problem. We sought to find a way to more effectively detect the minority class in the data. Through simulation, a total of 20 methods of over-sampling, under-sampling, and combined method of over- and under-sampling were compared. The logistic regression, support vector machine, and random forest models, which are commonly used in classification problems, were used as classifiers. The simulation results showed that the random under sampling (RUS) method had the highest sensitivity with an accuracy over 0.5. The next most sensitive method was an over-sampling adaptive synthetic sampling approach. This revealed that the RUS method was suitable for finding minority class values. The results of applying to some real data sets were similar to those of the simulation.
引用
收藏
页码:349 / 374
页数:26
相关论文
共 50 条
  • [41] Review of resampling techniques for the treatment of imbalanced industrial data classification in equipment condition monitoring
    Yuan, Yage
    Wei, Jianan
    Huang, Haisong
    Jiao, Weidong
    Wang, Jiaxin
    Chen, Hualin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [42] Majority-to-minority resampling for boosting-based classification under imbalanced data
    Gaoshan Wang
    Jian Wang
    Kejing He
    Applied Intelligence, 2023, 53 : 4541 - 4562
  • [43] Knowledge distillation with resampling for imbalanced data classification: Enhancing predictive performance and explainability stability
    Fujiwara, Kazuki
    RESULTS IN ENGINEERING, 2024, 24
  • [44] Majority-to-minority resampling for boosting-based classification under imbalanced data
    Wang, Gaoshan
    Wang, Jian
    He, Kejing
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4541 - 4562
  • [45] Value-Aware Resampling and Loss for Imbalanced Classification
    Sun, Li
    Song, Jie
    Hua, Cheng
    Shen, Chengchao
    Song, Mingli
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
  • [46] Resampling Imbalanced Healthcare Data for Predictive Modelling
    Mamilla, Manoj Yadav
    Al-Haddad, Ronak
    Chowdhury, Stiphen
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (02) : 36 - 44
  • [47] irrelevant attribute resistance approach to binary classification for imbalanced data
    Zheng, Jian
    Hu, Xin
    INFORMATION SCIENCES, 2024, 655
  • [48] A comparison of resampling methods for remote sensing classification and accuracy assessment
    Lyons, Mitchell B.
    Keith, David A.
    Phinn, Stuart R.
    Mason, Tanya J.
    Elith, Jane
    REMOTE SENSING OF ENVIRONMENT, 2018, 208 : 145 - 153
  • [49] A Comparison of Sampling Methods for Dealing with Imbalanced Wearable Sensor Data in Human Activity Recognition using Deep Learning
    El Ghazi, Mariam
    Aknin, Noura
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 290 - 305
  • [50] Aided Selection of Sampling Methods for Imbalanced Data Classification
    Sahni, Deep
    Pappu, Satya Jayadev
    Bhatt, Nirav
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 198 - 202