Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?

被引:0
|
作者
Fazelpour, Alireza [1 ]
Khoshgoftaar, Taghi M. [1 ]
Dittman, David J. [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Boosting; data sampling; ensemble learning; class imbalance; bioinformatics; CHEMOTHERAPY; PREDICTOR;
D O I
10.1109/ICMLA.2015.23
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Bioinformatics datasets contain many challenging characteristics, such as class imbalance, which adversely impacts the performance of supervised classification models built on these datasets. Techniques such as ensemble learning and data sampling from the domain of data mining can be deployed to alleviate the problem and to improve the classification performance. In this study, we sought to seek whether inclusion of data sampling within the ensemble framework can further improve the performance of classification models. To this end, we performed an experimental study using two newly hybrid ensemble techniques, one integrates feature selection within the boosting process and the other incorporates random under-sampling followed by feature selection within the boosting framework, two learners, three forms of feature rankers, and four feature subset sizes on 15 highly imbalanced bioinformatics datasets. Our results and statistical analysis demonstrate that the difference between the two boosting methods is statistically insignificant. Therefore, as the inclusion of data sampling has no significant positive effect on the performance of ensemble classifiers, it is not required to achieve maximum classification performance. To our knowledge, this is the first empirical study that examined the effects of data sampling, random under-sampling, to enhance classification performance of boosting algorithm for highly imbalanced bioinformatics data.
引用
收藏
页码:527 / 534
页数:8
相关论文
共 50 条
  • [1] Ensemble vs. Data Sampling: Which Option Is Best Suited to Improve Classification Performance of Imbalanced Bioinformatics Data?
    Khoshgoftaar, Taghi M.
    Fazelpour, Alireza
    Dittman, David J.
    Napolitano, Amri
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 705 - 712
  • [2] IMBoost: A New Weighting Factor for Boosting to Improve the Classification Performance of Imbalanced Data
    Roshan, Seyedehsan
    Tanha, Jafar
    Hallaji, Farzad
    Ghanbari, Mohammad-reza
    COMPLEXITY, 2023, 2023
  • [3] The Effect of Data Sampling When Using Random Forest on Imbalanced Bioinformatics Data
    Dittman, David J.
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2015, : 457 - 463
  • [4] Improving Learner Performance with Data Sampling and Boosting
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS, 2008, : 452 - 459
  • [5] OUBoost: boosting based over and under sampling technique for handling imbalanced data
    Mostafaei, Sahar Hassanzadeh
    Tanha, Jafar
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (10) : 3393 - 3411
  • [6] OUBoost: boosting based over and under sampling technique for handling imbalanced data
    Sahar Hassanzadeh Mostafaei
    Jafar Tanha
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3393 - 3411
  • [7] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
  • [8] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2009, 16 (03) : 193 - 210
  • [9] Effects of the Use of Boosting on Classification Performance of Imbalanced Bioinformatics Datasets
    Khoshgoftaar, Taghi M.
    Fazelpour, Alireza
    Dittman, David J.
    Napolitano, Amri
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2014, : 420 - 426
  • [10] Selecting The Appropriate Data Sampling Approach for Imbalanced and High-Dimensional Bioinformatics Datasets
    Dittman, David J.
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2014, : 304 - 310