A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data

被引:0
|
作者
Amir Reza Salehi
Majid Khedmati
机构
[1] Sharif University of Technology,Department of Industrial Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and random sample selection for sampling and sample generation. The performance of the proposed algorithm is evaluated and compared to different state-of-the-art competing algorithms based on 20 benchmark imbalanced datasets in terms of the harmonic mean of precision and recall (F1) and area under the receiver operating characteristics curve (AUC) measures. Based on the results, the proposed CSBBoost algorithm performs significantly better than the competing algorithms. In addition, a real-world dataset is used to demonstrate the applicability of the proposed algorithm.
引用
收藏
相关论文
共 50 条
  • [11] A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification
    Liaw, Lawrence Chuin Ming
    Tan, Shing Chiang
    Goh, Pey Yun
    Lim, Chee Peng
    INFORMATION SCIENCES, 2025, 686
  • [12] A new sampling method for classifying imbalanced data based on support vector machine ensemble
    Jian, Chuanxia
    Gao, Jian
    Ao, Yinhui
    NEUROCOMPUTING, 2016, 193 : 115 - 122
  • [13] A selective evolutionary heterogeneous ensemble algorithm for classifying imbalanced data
    An, Xiaomeng
    Xu, Sen
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (05): : 2733 - 2757
  • [14] MSFSS: A whale optimization-based multiple sampling feature selection stacking ensemble algorithm for classifying imbalanced data
    Wang, Shuxiang
    Shao, Changbin
    Xu, Sen
    Yang, Xibei
    Yu, Hualong
    AIMS MATHEMATICS, 2024, 9 (07): : 17504 - 17530
  • [15] A Novel Cluster based Over-sampling Approach for Classifying Imbalanced Sentiment Data
    Chang, Jing-Rong
    Chen, Long-Sheng
    Lin, Li-Wei
    IAENG International Journal of Computer Science, 2021, 48 (04):
  • [16] Comparison of Cluster-Based Sampling Approaches for Imbalanced Data of Crashes Involving Large Trucks
    Tahfim, Syed As-Sadeq
    Chen, Yan
    INFORMATION, 2024, 15 (03)
  • [17] Feature Selection and Ensemble Hierarchical Cluster-based Under-sampling Approach for Extremely Imbalanced Datasets
    Soltani, Sima
    Sadri, Javad
    Torshizi, Hassan Ahmadi
    2011 1ST INTERNATIONAL ECONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2011, : 166 - 171
  • [18] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
    Puntumapon, Kamthorn
    Rakthamamon, Thanawin
    Waiyamai, Kitsana
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
  • [19] A New Optimal Ensemble Algorithm Based on SVDD Sampling for Imbalanced Data Classification
    Pirgazi, Jamshid
    Pirmohammadi, Abbas
    Shams, Reza
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (06)
  • [20] EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
    Jung, Ilok
    Ji, Jaewon
    Cho, Changseob
    ELECTRONICS, 2022, 11 (09)