Using boosting tree to learn imbalanced data

被引:0
|
作者
Ridong Y. [1 ]
Shiyu Z. [1 ]
Lin L. [2 ]
Zhe W. [2 ]
Yi Z. [1 ]
机构
[1] Zhongshan School of Medicine, Sun Yat-Sen University, 510080, Guangdong
[2] College of Public Health, Xinjiang Medical University, 830001, Xinjiang
基金
中国国家自然科学基金;
关键词
BT; Class imbalanced; Data sampling; Machine learning;
D O I
10.19682/j.cnki.1005-8885.2019.1005
中图分类号
学科分类号
摘要
In case of machine learning, the problem of class imbalance is always troubling, i. e. one class of the samples has a larger magnitude than the other classes. This problem brings a preference of the classifier to the majority class, which leads to worse performance of the classifier on the minority class. We proposed an improved boosting tree(BT) algorithm for learning imbalanced data, called cost BT. In each iteration of the cost BT, only the weights of the misclassified minority class samples are increased. Meanwhile, the error rate in the weight formula of the base classifier is replaced by 1 minus F-measure. In this study, the performance of the cost BT algorithm is compared with other known methods on 9 public data sets. The compared methods include the decision tree and random forest algorithm, and both of them were combined with the sampling techniques such as synthetic minority oversampling technique (SMOTE), Borderline-SMOTE, adaptive synthetic sampling approach (ADASYN) and one sided selection. The cost BT algorithm performed better than the other compared methods in F-measure, G-mean and area under curve (AUC). In 6 of the 9 data sets, the cost BT algorithm has a superior performance to the other published methods. It can promote the prediction performance of the base classifiers by increasing the proportion of the minority class in the whole samples with only increasing the weights of the misclassified minority class samples in each iteration of the BT. In addition, computing the weights of the base classifiers with F-measure is helpful to the ensemble decisions. © 2019, Beijing University of Posts and Telecommunications. All rights reserved.
引用
收藏
页码:43 / 51
页数:8
相关论文
共 50 条
  • [1] Using boosting tree to learn imbalanced data
    Yang Ridong
    Zhang Shiyu
    Li Lin
    Wang Zhe
    Zhou Yi
    The Journal of China Universities of Posts and Telecommunications, 2019, 26 (02) : 43 - 51
  • [3] Improvement in Boosting Method by Using RUSTBoost Technique for Class Imbalanced Data
    Kumar, Ashutosh
    Bharti, Roshan
    Gupta, Deepak
    Saha, Anish Kumar
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 51 - 66
  • [4] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [5] Finding Structure in Data Using Multivariate Tree Boosting
    Miller, Patrick J.
    Lubke, Gitta H.
    McArtor, Daniel B.
    Bergeman, C. S.
    PSYCHOLOGICAL METHODS, 2016, 21 (04) : 583 - 602
  • [6] A review of boosting methods for imbalanced data classification
    Li, Qiujie
    Mao, Yaobin
    PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (04) : 679 - 693
  • [7] An Imbalanced Data Classification Algorithm Based on Boosting
    Li Qiu-Jie
    Mao Yao-Bin
    Wang Zhi-Quan
    2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 3053 - 3057
  • [8] A review of boosting methods for imbalanced data classification
    Qiujie Li
    Yaobin Mao
    Pattern Analysis and Applications, 2014, 17 : 679 - 693
  • [9] Online Bagging and Boosting for Imbalanced Data Streams
    Wang, Boyu
    Pineau, Joelle
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3353 - 3366
  • [10] A New Improved Boosting for Imbalanced Data Classification
    Zhang, Zongtang
    Qiu, JiaXing
    Dai, Weiguo
    2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019), 2019, 533