Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?

被引:0
|
作者
Fazelpour, Alireza [1 ]
Khoshgoftaar, Taghi M. [1 ]
Dittman, David J. [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
关键词
Boosting; data sampling; ensemble learning; class imbalance; bioinformatics; CHEMOTHERAPY; PREDICTOR;
D O I
10.1109/ICMLA.2015.23
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Bioinformatics datasets contain many challenging characteristics, such as class imbalance, which adversely impacts the performance of supervised classification models built on these datasets. Techniques such as ensemble learning and data sampling from the domain of data mining can be deployed to alleviate the problem and to improve the classification performance. In this study, we sought to seek whether inclusion of data sampling within the ensemble framework can further improve the performance of classification models. To this end, we performed an experimental study using two newly hybrid ensemble techniques, one integrates feature selection within the boosting process and the other incorporates random under-sampling followed by feature selection within the boosting framework, two learners, three forms of feature rankers, and four feature subset sizes on 15 highly imbalanced bioinformatics datasets. Our results and statistical analysis demonstrate that the difference between the two boosting methods is statistically insignificant. Therefore, as the inclusion of data sampling has no significant positive effect on the performance of ensemble classifiers, it is not required to achieve maximum classification performance. To our knowledge, this is the first empirical study that examined the effects of data sampling, random under-sampling, to enhance classification performance of boosting algorithm for highly imbalanced bioinformatics data.
引用
收藏
页码:527 / 534
页数:8
相关论文
共 50 条
  • [21] A New Improved Boosting for Imbalanced Data Classification
    Zhang, Zongtang
    Qiu, JiaXing
    Dai, Weiguo
    2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019), 2019, 533
  • [22] Neighbourhood sampling in bagging for imbalanced data
    Blaszczynski, Jerzy
    Stefanowski, Jerzy
    NEUROCOMPUTING, 2015, 150 : 529 - 542
  • [23] A Hybrid Sampling Method for Imbalanced Data
    Gazzah, Sami
    Hechkel, Amina
    Ben Amara, Najoua Essoukri
    2015 IEEE 12TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2015,
  • [24] Classification Performance of Three Approaches for Combining Data Sampling and Gene Selection on Bioinformatics Data
    Khoshgoftaar, Taghi M.
    Fazelpour, Alireza
    Dittman, David J.
    Napolitano, Amri
    2014 IEEE 15TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2014, : 315 - 321
  • [25] Sampling plus Reweighting: Boosting the Performance of AdaBoost on Imbalanced Datasets
    Yuan, Bo
    Ma, Xiaoli
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [26] Handling Imbalanced Data for Real-Time Crash Prediction: Application of Boosting and Sampling Techniques
    Ariannezhad, Amin
    Karimpour, Abolfazl
    Qin, Xiao
    Wu, Yao-Jan
    Salmani, Yasamin
    JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2021, 147 (03)
  • [27] An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance
    Hassib, Eslam Mohsen
    El-Desouky, Ali Ibrahim
    El-Kenawy, El-Sayed M.
    El-Ghamrawy, Sally M.
    IEEE ACCESS, 2019, 7 : 170774 - 170795
  • [28] Borderline over-sampling in feature space for learning algorithms in imbalanced data environments
    Savetratanakaree, Kittipat (kittipatsavet@gmail.com), 1600, International Association of Engineers (43):
  • [29] Deep Neural Architectures for Highly Imbalanced Data in Bioinformatics
    Bugnon, Leandro A.
    Yones, Cristian
    Milone, Diego H.
    Stegmayer, Georgina
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2857 - 2867
  • [30] EWT-SMOTE to improve default prediction performance in imbalanced data: Analysis of Chinese data
    Zhou, Ying
    Lin, Xia
    Chi, Guotai
    Jin, Peng
    Li, Mengtong
    JOURNAL OF FORECASTING, 2024, 43 (03) : 615 - 643