A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data

被引:0
|
作者
Amir Reza Salehi
Majid Khedmati
机构
[1] Sharif University of Technology,Department of Industrial Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and random sample selection for sampling and sample generation. The performance of the proposed algorithm is evaluated and compared to different state-of-the-art competing algorithms based on 20 benchmark imbalanced datasets in terms of the harmonic mean of precision and recall (F1) and area under the receiver operating characteristics curve (AUC) measures. Based on the results, the proposed CSBBoost algorithm performs significantly better than the competing algorithms. In addition, a real-world dataset is used to demonstrate the applicability of the proposed algorithm.
引用
收藏
相关论文
共 50 条
  • [31] Adaptive Ensemble Method Based on Spatial Characteristics for Classifying Imbalanced Data
    Wang, Lei
    Zhao, Lei
    Gui, Guan
    Zheng, Baoyu
    Huang, Ruochen
    SCIENTIFIC PROGRAMMING, 2017, 2017
  • [32] Dealing with Imbalanced Dataset: A Re-sampling Method Based on the Improved SMOTE Algorithm
    Xue, Wei
    Zhang, Jing
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2016, 45 (04) : 1160 - 1172
  • [33] Autonomic active learning strategy using cluster-based ensemble classifier for concept drifts in imbalanced data stream
    Halder, Bohnishikha
    Hasan, K. M. Azharul
    Amagasa, Toshiyuki
    Ahmed, Md Manjur
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [34] A Cluster-Based Boosting Algorithm for Bankruptcy Prediction in a Highly Imbalanced Dataset
    Le, Tuong
    Son, Le Hoang
    Minh Thanh Vo
    Lee, Mi Young
    Baik, Sung Wook
    SYMMETRY-BASEL, 2018, 10 (07):
  • [35] Ensemble of Classifiers Based on Multiobjective Genetic Sampling for Imbalanced Data
    Fernandes, Everlandio R. Q.
    de Carvalho, Andre C. P. L. F.
    Yao, Xin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (06) : 1104 - 1115
  • [36] Hierarchical cluster-based IELM for financial distress prediction with imbalanced data
    Amal Ibrahim Al Ali
    S. Sheeja Rani
    P. V. Pravija Raj
    Ahmed M. Khedr
    Neural Computing and Applications, 2025, 37 (5) : 2925 - 2943
  • [37] Kernel cluster-based ensemble SVM approaches for unbalanced data
    Tao, X. (taoxinmin@hrbeu.edu.cn), 2013, Editorial Board of Journal of Harbin Engineering (34):
  • [38] DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering
    Li, Xinqi
    Liu, Qicheng
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (12): : 17760 - 17789
  • [39] Imbalanced Hyperspectral Image Classification With an Adaptive Ensemble Method Based on SMOTE and Rotation Forest With Differentiated Sampling Rates
    Feng, Wei
    Huang, Wenjiang
    Bao, Wenxing
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (12) : 1879 - 1883
  • [40] Entropy-based hybrid sampling ensemble learning for imbalanced data
    Dongdong, Li
    Ziqiu, Chi
    Bolu, Wang
    Zhe, Wang
    Hai, Yang
    Wenli, Du
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (07) : 3039 - 3067