Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?

被引：0

作者：

Fazelpour, Alireza ^{[1
]}

Khoshgoftaar, Taghi M. ^{[1
]}

Dittman, David J. ^{[1
]}

Napolitano, Amri ^{[1
]}

机构：

[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA

来源：

2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) | 2015年

关键词：

Boosting; data sampling; ensemble learning; class imbalance; bioinformatics; CHEMOTHERAPY; PREDICTOR;

D O I：

10.1109/ICMLA.2015.23

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bioinformatics datasets contain many challenging characteristics, such as class imbalance, which adversely impacts the performance of supervised classification models built on these datasets. Techniques such as ensemble learning and data sampling from the domain of data mining can be deployed to alleviate the problem and to improve the classification performance. In this study, we sought to seek whether inclusion of data sampling within the ensemble framework can further improve the performance of classification models. To this end, we performed an experimental study using two newly hybrid ensemble techniques, one integrates feature selection within the boosting process and the other incorporates random under-sampling followed by feature selection within the boosting framework, two learners, three forms of feature rankers, and four feature subset sizes on 15 highly imbalanced bioinformatics datasets. Our results and statistical analysis demonstrate that the difference between the two boosting methods is statistically insignificant. Therefore, as the inclusion of data sampling has no significant positive effect on the performance of ensemble classifiers, it is not required to achieve maximum classification performance. To our knowledge, this is the first empirical study that examined the effects of data sampling, random under-sampling, to enhance classification performance of boosting algorithm for highly imbalanced bioinformatics data.

引用

页码：527 / 534

页数：8

共 50 条

[31] Classification on Imbalanced Data Sets, Taking Advantage of Errors to Improve Performance
Lopez-Chau, Asdrubal
Garcia-Lamont, Farid
Cervantes, Jair
ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 72 - 78
[32] Boosting support vector machines for imbalanced data sets
Wang, Benjamin X.
Japkowicz, Nathalie
KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (01) : 1 - 20
[33] Boosting support vector machines for imbalanced data sets
Wang, Benjamin X.
Japkowicz, Nathalie
FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2008, 4994 : 38 - 47
[34] MEBoost: Mixing Estimators with Boosting for Imbalanced Data Classification
Rayhan, Farshid
Ahmed, Sajid
Mahbub, Asif
Jani, Md. Rafsan
Shatabda, Swakkhar
Farid, Dewan Md.
Rahman, Chowdhury Mofizur
2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
[35] Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data
Khoshgoftaar, Taghi M.
Van Hulse, Jason
Napolitano, Amri
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2011, 41 (03): : 552 - 568
[36] Boosting support vector machines for imbalanced data sets
Benjamin X. Wang
Nathalie Japkowicz
Knowledge and Information Systems, 2010, 25 : 1 - 20
[37] Boosting Mobile Apps under Imbalanced Sensing Data
Zhang, Xinglin
Yang, Zheng
Shangguan, Longfei
Liu, Yunhao
Chen, Lei
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2015, 14 (06) : 1151 - 1161
[38] Boosting imbalanced data learning with Wiener process oversampling
Qian Li
Gang Li
Wenjia Niu
Yanan Cao
Liang Chang
Jianlong Tan
Li Guo
Frontiers of Computer Science, 2017, 11 : 836 - 851
[39] Boosting imbalanced data learning with Wiener process oversampling
Li, Qian
Li, Gang
Niu, Wenjia
Cao, Yanan
Chang, Liang
Tan, Jianlong
Guo, Li
FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (05) : 836 - 851
[40] Oversampling boosting for classification of imbalanced software defect data
Li, Guangling
Wang, Shihai
PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 4149 - 4154

← 1 2 3 4 5 →