A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data

被引:0
|
作者
Amir Reza Salehi
Majid Khedmati
机构
[1] Sharif University of Technology,Department of Industrial Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and random sample selection for sampling and sample generation. The performance of the proposed algorithm is evaluated and compared to different state-of-the-art competing algorithms based on 20 benchmark imbalanced datasets in terms of the harmonic mean of precision and recall (F1) and area under the receiver operating characteristics curve (AUC) measures. Based on the results, the proposed CSBBoost algorithm performs significantly better than the competing algorithms. In addition, a real-world dataset is used to demonstrate the applicability of the proposed algorithm.
引用
收藏
相关论文
共 50 条
  • [41] EVOLUTIONARY-BASED ENSEMBLE UNDER-SAMPLING FOR IMBALANCED DATA
    Zhang, Yongqing
    Lu, Rongzhao
    Huang, Ji
    Gao, Dongrui
    2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 212 - 216
  • [42] ECC plus plus : An algorithm family based on ensemble of classifier chains for classifying imbalanced multi-label data
    Duan, Jicong
    Gu, Yan
    Yu, Hualong
    Yang, Xibei
    Gao, Shang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 236
  • [43] A Classification Method for Imbalanced Data Based on SMOTE and Fuzzy Rough Nearest Neighbor Algorithm
    Zhao, Weibin
    Xu, Mengting
    Jia, Xiuyi
    Shang, Lin
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, RSFDGRC 2015, 2015, 9437 : 340 - 351
  • [44] Cluster-based Under-sampling with Random Forest for Multi-Class Imbalanced Classification
    Arafat, Md. Yasir
    Hoque, Sabera
    Farid, Dewan Md.
    2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
  • [45] A heuristic-based hybrid sampling method using a combination of SMOTE and ENN for imbalanced health data
    Nizam-Ozogur, Hatice
    Orman, Zeynep
    EXPERT SYSTEMS, 2024, 41 (08)
  • [46] Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data
    Li Yijing
    Guo Haixiang
    Liu Xiao
    Li Yanan
    Li Jinling
    KNOWLEDGE-BASED SYSTEMS, 2016, 94 : 88 - 104
  • [47] An Improved Oversampling Algorithm Based on the Samples' Selection Strategy for Classifying Imbalanced Data
    Xie, Wenhao
    Liang, Gongqian
    Dong, Zhonghui
    Tan, Baoyu
    Zhang, Baosheng
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019
  • [48] CBReT: A Cluster-Based Resampling Technique for dealing with imbalanced data in code smell prediction
    Thakur, Praveen Singh
    Jadeja, Mahipal
    Chouhan, Satyendra Singh
    KNOWLEDGE-BASED SYSTEMS, 2024, 286
  • [49] ENHANCING THE PERFORMANCE OF SMOTE ALGORITHM BY USING ATTRIBUTE WEIGHTING SCHEME AND NEW SELECTIVE SAMPLING METHOD FOR IMBALANCED DATA SET
    Fahrudin, Tora
    Buliali, Joko Lianto
    Fatichah, Chastine
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (02): : 423 - 444
  • [50] The Research of Imbalanced Data Set of Sample Sampling Method Based on K-Means Cluster and Genetic Algorithm
    Yong, Yang
    2012 INTERNATIONAL CONFERENCE ON FUTURE ELECTRICAL POWER AND ENERGY SYSTEM, PT A, 2012, 17 : 164 - 170