An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases

被引：0

作者：

Rani, T. Sobha ^{[1
]}

Soujanya, P. V. ^{[2
]}

机构：

[1] Univ Hyderabad, Sch Comp & Informat Sci, Computat Intelligence Lab, Hyderabad 500134, Andhra Pradesh, India

[2] Tata Consultancy Serv, Dept Syst Engn, Bangalore, Karnataka, India

来源：

2013 SIXTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3) | 2013年

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Nearly all aspects of cell life and death are controlled by the phosphorylation of proteins which are catalyzed by kinases. Malfunctioning of kinases results in cell disorders causing cancers and other diseases. The present study deals with the identification of predominant features present in the inhibitors targeting these enzymes and classification of the kinase and non-kinase inhibitors using machine learning algorithms. The present work deals with two challenges. The first challenge is the classification of unbalanced data sets. Unbalanced data sets are the data sets in which there is an imbalance in the size of data sets that constitute these sets. The second challenge is the concept complexity (closely related minority and majority data sets in the feature space). Our approach deals with the binary classification of approved human inhibitors present in the Drug bank database into kinase and non-kinase inhibitors. Clustering of the inhibitors followed by classification using an ensemble consisting of several classification models is generated. Classification is done in two levels. Weighted voting is used after each level. Finally an overall accuracy of 80% is obtained after two levels of classification. Thus we established a new a type of approach for the classification of unbalanced data sets and the data sets in which there is an overlap between instances belonging to dierent classes. Finally we established a signature specific to kinase inhibitors.

引用

页码：516 / 521

页数：6

共 50 条

[21] Data-based structural health monitoring using small training data sets
Balsamo, Luciana
Betti, Raimondo
STRUCTURAL CONTROL & HEALTH MONITORING, 2015, 22 (10): : 1240 - 1264
[22] An Effective Over-sampling Method for Imbalanced Data Sets Classification
Zhai Yun
Ma Nan
Ruan Da
An Bing
CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (03): : 489 - 494
[23] Research on Method Application of Transforming Fuzzy Sets Using SPA Sets
Xie, Li
Zhou, Wenbo
Shi, Lei
3RD INTERNATIONAL CONFERENCE ON APPLIED ENGINEERING, 2016, 51 : 637 - 642
[24] SVM classification for imbalanced data sets using a multiobjective optimization framework
Askan, Aysegul
Sayin, Serpil
ANNALS OF OPERATIONS RESEARCH, 2014, 216 (01) : 191 - 203
[25] SVM classification for imbalanced data sets using a multiobjective optimization framework
Ayşegül Aşkan
Serpil Sayın
Annals of Operations Research, 2014, 216 : 191 - 203
[26] Classification of Imbalanced data sets using Multi Objective Genetic Programming
Maheta, Hardik H.
Dabhi, Vipul K.
2015 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2015,
[27] Classifying imbalanced data sets using similarity based hierarchical decomposition
Beyan, Cigdem
Fisher, Robert
PATTERN RECOGNITION, 2015, 48 (05) : 1653 - 1672
[28] Stacked generalizations in imbalanced fraud data sets using resampling methods
Kerwin, Kathleen R.
Bastian, Nathaniel D.
JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2021, 18 (03): : 175 - 192
[29] Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets
M. Faisal Zaman
Hideo Hirose
New Generation Computing, 2011, 29 : 277 - 292
[30] Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets
Zaman, M. Faisal
Hirose, Hideo
NEW GENERATION COMPUTING, 2011, 29 (03) : 277 - 292

← 1 2 3 4 5 →