A weighted hybrid ensemble method for classifying imbalanced data

被引:0
|
作者
Zhao, Jiakun [1 ]
Jin, Ju [1 ]
Chen, Si [1 ]
Zhang, Ruifeng [1 ]
Yu, Bilin [2 ]
Liu, Qingfang [3 ]
机构
[1] School of Software Engineering, Xi'an Jiaotong University, 710049, China
[2] School of Management, University of Science and Technology of China, 230026, China
[3] School of Mathematics and Statistics, Xi'an Jiaotong University, 710049, China
基金
中国国家自然科学基金;
关键词
Benchmarking - Data mining - Classification (of information);
D O I
暂无
中图分类号
学科分类号
摘要
In real datasets, most are unbalanced. Data imbalance can be defined as the number of instances in some classes greatly exceeds the number of instances in other classes. Whether in the field of data mining or machine learning, data imbalance can have adverse effects. At present, the methods to solve the problem of data imbalance can be divided into data-level methods, algorithm-level methods and hybrid methods. In this paper, we propose a weighted hybrid ensemble method for classifying imbalanced data in binary classification tasks, called WHMBoost. In the framework of the boosting algorithm, the presented method combines two data sampling methods and two base classifiers, and each sampling method and each base classifier is assigned corresponding weights, which makes them have better complementary advantages. The performance of WHMBoost has been evaluated on 40 benchmark imbalanced datasets with state of the art ensemble methods like AdaBoost, RUSBoost, SMOTEBoost using AUC, F-Measure and Geometric Mean as the performance evaluation criteria. Experimental results show significant improvement over the other methods and it can be concluded that WHMBoost is a promising and effective algorithm to deal with imbalance datasets. © 2020 Elsevier B.V.
引用
收藏
相关论文
共 50 条
  • [31] Weighted Ensemble with Dynamical Chunk Size for Imbalanced Data Streams in Nonstationary Environment
    Liu, Nini
    Zhu, Wen
    Liao, Bo
    Ren, Siqi
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2017), 2017, 74 : 364 - 367
  • [32] Classifying highly imbalanced ICU data
    Roumani, Yazan F.
    May, Jerrold H.
    Strum, David P.
    Vargas, Luis G.
    HEALTH CARE MANAGEMENT SCIENCE, 2013, 16 (02) : 119 - 128
  • [33] Lazy bagging for classifying imbalanced data
    Zhu, Xingquan
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 763 - 768
  • [34] Classifying highly imbalanced ICU data
    Yazan F. Roumani
    Jerrold H. May
    David P. Strum
    Luis G. Vargas
    Health Care Management Science, 2013, 16 : 119 - 128
  • [35] Entropy-based hybrid sampling ensemble learning for imbalanced data
    Dongdong, Li
    Ziqiu, Chi
    Bolu, Wang
    Zhe, Wang
    Hai, Yang
    Wenli, Du
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (07) : 3039 - 3067
  • [36] An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling
    Gao, Xin
    Ren, Bing
    Zhang, Hao
    Sun, Bohao
    Li, Junliang
    Xu, Jianhang
    He, Yang
    Li, Kangsheng
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 160
  • [37] Ensemble learning method based on CNN for class imbalanced data
    Xin Zhong
    Nan Wang
    The Journal of Supercomputing, 2024, 80 : 10090 - 10121
  • [38] A Combination of Resampling and Ensemble Method for Text Classification on Imbalanced Data
    Feng, Haijun
    Qin, Wen
    Wang, Huijing
    Li, Yi
    Hu, Guangwu
    BIG DATA, BIGDATA 2021, 2022, 12988 : 3 - 16
  • [39] Highly imbalanced fault classification of wind turbines using data resampling and hybrid ensemble method approach
    Chatterjee, Subhajit
    Byun, Yung-Cheol
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [40] IIvotes ensemble for imbalanced data
    Blaszczynski, Jerzy
    Deckert, Magdalena
    Stefanowski, Jerzy
    Wilk, Szymon
    INTELLIGENT DATA ANALYSIS, 2012, 16 (05) : 777 - 801