A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data

被引:0
|
作者
Amir Reza Salehi
Majid Khedmati
机构
[1] Sharif University of Technology,Department of Industrial Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and random sample selection for sampling and sample generation. The performance of the proposed algorithm is evaluated and compared to different state-of-the-art competing algorithms based on 20 benchmark imbalanced datasets in terms of the harmonic mean of precision and recall (F1) and area under the receiver operating characteristics curve (AUC) measures. Based on the results, the proposed CSBBoost algorithm performs significantly better than the competing algorithms. In addition, a real-world dataset is used to demonstrate the applicability of the proposed algorithm.
引用
收藏
相关论文
共 50 条
  • [1] A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data
    Salehi, Amir Reza
    Khedmati, Majid
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [2] Cluster-based sampling of multiclass imbalanced data
    Prachuabsupakij, Wanthanee
    Soonthornphisaj, Nuanwan
    INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1109 - 1135
  • [3] A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data
    Xu, Zhaozhao
    Shen, Derong
    Nie, Tiezheng
    Kou, Yue
    Yin, Nan
    Han, Xi
    INFORMATION SCIENCES, 2021, 572 : 574 - 589
  • [4] Ensemble classification algorithm based improved SMOTE for imbalanced data
    Ning, Liu, 1600, Natsional'nyi Hirnychyi Universytet
  • [5] Addressing imbalanced data classification with Cluster-Based Reduced Noise SMOTE
    Hemmatian, Javad
    Hajizadeh, Rassoul
    Nazari, Fakhroddin
    PLOS ONE, 2025, 20 (02):
  • [6] A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data
    Guzman-Ponce, A.
    Valdovinos, R. M.
    Sanchez, J. S.
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 299 - 311
  • [7] Cluster-based sampling approaches to imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 427 - 436
  • [8] A cluster-based hybrid sampling approach for imbalanced data classification
    Feng, Shou
    Zhao, Chunhui
    Fu, Ping
    REVIEW OF SCIENTIFIC INSTRUMENTS, 2020, 91 (05):
  • [9] Cluster-based under-sampling approaches for imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5718 - 5727
  • [10] SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling
    Agrawal, Astha
    Viktor, Herna L.
    Paquet, Eric
    2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 226 - 233