Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?

被引:0
|
作者
Fazelpour, Alireza [1 ]
Khoshgoftaar, Taghi M. [1 ]
Dittman, David J. [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Boosting; data sampling; ensemble learning; class imbalance; bioinformatics; CHEMOTHERAPY; PREDICTOR;
D O I
10.1109/ICMLA.2015.23
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Bioinformatics datasets contain many challenging characteristics, such as class imbalance, which adversely impacts the performance of supervised classification models built on these datasets. Techniques such as ensemble learning and data sampling from the domain of data mining can be deployed to alleviate the problem and to improve the classification performance. In this study, we sought to seek whether inclusion of data sampling within the ensemble framework can further improve the performance of classification models. To this end, we performed an experimental study using two newly hybrid ensemble techniques, one integrates feature selection within the boosting process and the other incorporates random under-sampling followed by feature selection within the boosting framework, two learners, three forms of feature rankers, and four feature subset sizes on 15 highly imbalanced bioinformatics datasets. Our results and statistical analysis demonstrate that the difference between the two boosting methods is statistically insignificant. Therefore, as the inclusion of data sampling has no significant positive effect on the performance of ensemble classifiers, it is not required to achieve maximum classification performance. To our knowledge, this is the first empirical study that examined the effects of data sampling, random under-sampling, to enhance classification performance of boosting algorithm for highly imbalanced bioinformatics data.
引用
收藏
页码:527 / 534
页数:8
相关论文
共 50 条
  • [31] Classification on Imbalanced Data Sets, Taking Advantage of Errors to Improve Performance
    Lopez-Chau, Asdrubal
    Garcia-Lamont, Farid
    Cervantes, Jair
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 72 - 78
  • [32] Boosting support vector machines for imbalanced data sets
    Wang, Benjamin X.
    Japkowicz, Nathalie
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (01) : 1 - 20
  • [33] Boosting support vector machines for imbalanced data sets
    Wang, Benjamin X.
    Japkowicz, Nathalie
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2008, 4994 : 38 - 47
  • [34] MEBoost: Mixing Estimators with Boosting for Imbalanced Data Classification
    Rayhan, Farshid
    Ahmed, Sajid
    Mahbub, Asif
    Jani, Md. Rafsan
    Shatabda, Swakkhar
    Farid, Dewan Md.
    Rahman, Chowdhury Mofizur
    2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
  • [35] Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2011, 41 (03): : 552 - 568
  • [36] Boosting support vector machines for imbalanced data sets
    Benjamin X. Wang
    Nathalie Japkowicz
    Knowledge and Information Systems, 2010, 25 : 1 - 20
  • [37] Boosting Mobile Apps under Imbalanced Sensing Data
    Zhang, Xinglin
    Yang, Zheng
    Shangguan, Longfei
    Liu, Yunhao
    Chen, Lei
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2015, 14 (06) : 1151 - 1161
  • [38] Boosting imbalanced data learning with Wiener process oversampling
    Qian Li
    Gang Li
    Wenjia Niu
    Yanan Cao
    Liang Chang
    Jianlong Tan
    Li Guo
    Frontiers of Computer Science, 2017, 11 : 836 - 851
  • [39] Boosting imbalanced data learning with Wiener process oversampling
    Li, Qian
    Li, Gang
    Niu, Wenjia
    Cao, Yanan
    Chang, Liang
    Tan, Jianlong
    Guo, Li
    FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (05) : 836 - 851
  • [40] Oversampling boosting for classification of imbalanced software defect data
    Li, Guangling
    Wang, Shihai
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 4149 - 4154