Improving undersampling-based ensemble with rotation forest for imbalanced problem

被引:8
|
作者
Guo, Huaping [1 ]
Diao, Xiaoyu [1 ]
Liu, Hongbing [1 ]
机构
[1] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang, Peoples R China
基金
中国国家自然科学基金;
关键词
Undersampling; ensemble; rotation forest; imbalanced problem; SMOTE; ALGORITHMS;
D O I
10.3906/elk-1805-159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As one of the most challenging and attractive issues in pattern recognition and machine learning, the imbalanced problem has attracted increasing attention. For two-class data, imbalanced data are characterized by the size of one class (majority class) being much larger than that of the other class (minority class), which makes the constructed models focus more on the majority class and ignore or even misclassify the examples of the minority class. The undersampling-based ensemble, which learns individual classifiers from undersampled balanced data, is an effective method to cope with the class-imbalance data. The problem in this method is that the size of the dataset to train each classifier is notably small; thus, how to generate individual classifiers with high performance from the limited data is a key to the success of the method. In this paper, rotation forest (an ensemble method) is used to improve the performance of the undersampling-based ensemble on the imbalanced problem because rotation forest has higher performance than other ensemble methods such as bagging, boosting, and random forest, particularly for small-sized data. In addition, rotation forest is more sensitive to the sampling technique than some robust methods including SVM and neural networks; thus, it is easier to create individual classifiers with diversity using rotation forest. Two versions of the improved undersampling-based ensemble methods are implemented: 1) undersampling subsets from the majority class and learning each classifier using the rotation forest on the data obtained by combing each subset with the minority class and 2) similarly to the first method, with the exception of removing the majority class examples that are correctly classified with high confidence after learning each classifier for further consideration. The experimental results show that the proposed methods show significantly better performance on measures of recall, g-mean, f-measure, and AUC than other state-of-the-art methods on 30 datasets with various data distributions and different imbalance ratios.
引用
收藏
页码:1371 / 1386
页数:16
相关论文
共 50 条
  • [31] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [32] Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
    Onan, Aytug
    SCIENTIFIC PROGRAMMING, 2019, 2019
  • [33] Undersampling method based on minority class density for imbalanced data
    Sun, Zhongqiang
    Ying, Wenhao
    Zhang, Wenjin
    Gong, Shengrong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [34] A new rotation forest ensemble algorithm
    Chenglin Wen
    Tingting Huai
    Qinghua Zhang
    Zhihuan Song
    Feilong Cao
    International Journal of Machine Learning and Cybernetics, 2022, 13 : 3569 - 3576
  • [35] A new rotation forest ensemble algorithm
    Wen, Chenglin
    Huai, Tingting
    Zhang, Qinghua
    Song, Zhihuan
    Cao, Feilong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (11) : 3569 - 3576
  • [36] Incoherent Undersampling-Based Waveform Reconstruction Using a Time-Domain Zero-Crossing Metric
    Bhatta, Debesh
    Tzou, Nicholas
    Wells, Joshua W.
    Hsiao, Sen-Wen
    Chatterjee, Abhijit
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2015, 23 (11) : 2357 - 2370
  • [37] An ensemble-based model for two-class imbalanced financial problem
    Liao, Jui-Jung
    Shih, Ching-Hui
    Chen, Tai-Feng
    Hsu, Ming-Fu
    ECONOMIC MODELLING, 2014, 37 : 175 - 183
  • [38] Investigation of Rotation Forest Ensemble Method Using Genetic Fuzzy Systems for a Regression Problem
    Lasota, Tadeusz
    Telec, Zbigniew
    Trawinski, Bogdan
    Trawinski, Grzegorz
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2012), PT I, 2012, 7196 : 393 - 402
  • [39] Exploring Maximum Tree Depth and Random Undersampling in Ensemble Trees to Optimize the Classification of Imbalanced Big Data
    Hancock J.T., III
    Khoshgoftaar T.M.
    SN Computer Science, 4 (5)
  • [40] SMOTE-Based Weighted Deep Rotation Forest for the Imbalanced Hyperspectral Data Classification
    Quan, Yinghui
    Zhong, Xian
    Feng, Wei
    Chan, Jonathan Cheung-Wai
    Li, Qiang
    Xing, Mengdao
    REMOTE SENSING, 2021, 13 (03) : 1 - 25