Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?

被引:0
|
作者
Fazelpour, Alireza [1 ]
Khoshgoftaar, Taghi M. [1 ]
Dittman, David J. [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Boosting; data sampling; ensemble learning; class imbalance; bioinformatics; CHEMOTHERAPY; PREDICTOR;
D O I
10.1109/ICMLA.2015.23
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Bioinformatics datasets contain many challenging characteristics, such as class imbalance, which adversely impacts the performance of supervised classification models built on these datasets. Techniques such as ensemble learning and data sampling from the domain of data mining can be deployed to alleviate the problem and to improve the classification performance. In this study, we sought to seek whether inclusion of data sampling within the ensemble framework can further improve the performance of classification models. To this end, we performed an experimental study using two newly hybrid ensemble techniques, one integrates feature selection within the boosting process and the other incorporates random under-sampling followed by feature selection within the boosting framework, two learners, three forms of feature rankers, and four feature subset sizes on 15 highly imbalanced bioinformatics datasets. Our results and statistical analysis demonstrate that the difference between the two boosting methods is statistically insignificant. Therefore, as the inclusion of data sampling has no significant positive effect on the performance of ensemble classifiers, it is not required to achieve maximum classification performance. To our knowledge, this is the first empirical study that examined the effects of data sampling, random under-sampling, to enhance classification performance of boosting algorithm for highly imbalanced bioinformatics data.
引用
收藏
页码:527 / 534
页数:8
相关论文
共 50 条
  • [11] Deep Learning and Data Sampling with Imbalanced Big Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 175 - 183
  • [12] Is Gene Selection Enough for Imbalanced Bioinformatics Data?
    Abu Shanab, Ahmad
    Khoshgoftaar, Taghi M.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 346 - 355
  • [14] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [15] Using boosting tree to learn imbalanced data
    Yang Ridong
    Zhang Shiyu
    Li Lin
    Wang Zhe
    Zhou Yi
    The Journal of China Universities of Posts and Telecommunications, 2019, 26 (02) : 43 - 51
  • [16] A review of boosting methods for imbalanced data classification
    Li, Qiujie
    Mao, Yaobin
    PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (04) : 679 - 693
  • [17] An Imbalanced Data Classification Algorithm Based on Boosting
    Li Qiu-Jie
    Mao Yao-Bin
    Wang Zhi-Quan
    2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 3053 - 3057
  • [18] A review of boosting methods for imbalanced data classification
    Qiujie Li
    Yaobin Mao
    Pattern Analysis and Applications, 2014, 17 : 679 - 693
  • [19] Online Bagging and Boosting for Imbalanced Data Streams
    Wang, Boyu
    Pineau, Joelle
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3353 - 3366
  • [20] Using boosting tree to learn imbalanced data
    Ridong Y.
    Shiyu Z.
    Lin L.
    Zhe W.
    Yi Z.
    Journal of China Universities of Posts and Telecommunications, 2019, 26 (02): : 43 - 51