A novel ensemble method for classifying imbalanced data

被引:308
|
作者
Sun, Zhongbin [1 ]
Song, Qinbao [1 ]
Zhu, Xiaoyan [1 ]
Sun, Heli [1 ]
Xu, Baowen [2 ]
Zhou, Yuming [2 ]
机构
[1] Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Xian 710049, Peoples R China
[2] Nanjing Univ, Dept Comp Sci & Technol, Nanjing 210093, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Classification; Ensemble learning; NEURAL-NETWORKS; SOFTWARE TOOL; DATA SETS; CLASSIFICATION; ALGORITHMS; ACCURACY; SMOTE; KEEL;
D O I
10.1016/j.patcog.2014.11.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The class imbalance problems have been reported to severely hinder classification performance of many standard learning algorithms, and have attracted a great deal of attention from researchers of different fields. Therefore, a number of methods, such as sampling methods, cost-sensitive learning methods, and bagging and boosting based ensemble methods, have been proposed to solve these problems. However, these conventional class imbalance handling methods might suffer from the loss of potentially useful information, unexpected mistakes or increasing the likelihood of overfitting because they may alter the original data distribution. Thus we propose a novel ensemble method, which firstly converts an imbalanced data set into multiple balanced ones and then builds a number of classifiers on these multiple data with a specific classification algorithm. Finally, the classification results of these classifiers for new data are combined by a specific ensemble rule. In the empirical study, different class imbalance data handling methods including three conventional sampling methods, one cost-sensitive learning method, six Bagging and Boosting based ensemble methods, our previous method EM1vs1 and two fuzzy-rule based classification methods were compared with our method. The experimental results on 46 imbalanced data sets show that our proposed method is usually superior to the conventional imbalance data handling methods when solving the highly imbalanced problems. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1623 / 1637
页数:15
相关论文
共 50 条
  • [1] A weighted hybrid ensemble method for classifying imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Chen, Si
    Zhang, Ruifeng
    Yu, Bilin
    Liu, Qingfang
    Knowledge-Based Systems, 2020, 203
  • [2] A weighted hybrid ensemble method for classifying imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Chen, Si
    Zhang, Ruifeng
    Yu, Bilin
    Liu, Qingfang
    KNOWLEDGE-BASED SYSTEMS, 2020, 203
  • [3] Adaptive Ensemble Method Based on Spatial Characteristics for Classifying Imbalanced Data
    Wang, Lei
    Zhao, Lei
    Gui, Guan
    Zheng, Baoyu
    Huang, Ruochen
    SCIENTIFIC PROGRAMMING, 2017, 2017
  • [4] EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
    Jung, Ilok
    Ji, Jaewon
    Cho, Changseob
    ELECTRONICS, 2022, 11 (09)
  • [5] A new sampling method for classifying imbalanced data based on support vector machine ensemble
    Jian, Chuanxia
    Gao, Jian
    Ao, Yinhui
    NEUROCOMPUTING, 2016, 193 : 115 - 122
  • [6] An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data
    Yu, Hualong
    Ni, Jun
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 657 - 666
  • [7] A selective evolutionary heterogeneous ensemble algorithm for classifying imbalanced data
    An, Xiaomeng
    Xu, Sen
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (05): : 2733 - 2757
  • [8] Ensemble of Trees for Classifying High-Dimensional Imbalanced Genomic Data
    Farid, Dewan Md.
    Nowe, Ann
    Manderick, Bernard
    PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 1, 2018, 15 : 172 - 187
  • [9] Optimized Ensemble Methods for Classifying Imbalanced Water Quality Index Data
    Lawal, Zaharaddeen Karami
    Aldrees, Ali
    Yassin, Hayati
    Dan'azumi, Salisu
    Naganna, Sujay Raghavendra
    Abba, Sani I.
    Sammen, Saad Sh.
    IEEE ACCESS, 2024, 12 : 178536 - 178551
  • [10] A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
    ThulasiBikku
    Rao, Sambasiva
    Akepogu, Ananda Rao
    INTERNATIONAL CONFERENCE ON MATERIALS, ALLOYS AND EXPERIMENTAL MECHANICS (ICMAEM-2017), 2017, 225