An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases

被引:0
|
作者
Rani, T. Sobha [1 ]
Soujanya, P. V. [2 ]
机构
[1] Univ Hyderabad, Sch Comp & Informat Sci, Computat Intelligence Lab, Hyderabad 500134, Andhra Pradesh, India
[2] Tata Consultancy Serv, Dept Syst Engn, Bangalore, Karnataka, India
来源
2013 SIXTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3) | 2013年
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Nearly all aspects of cell life and death are controlled by the phosphorylation of proteins which are catalyzed by kinases. Malfunctioning of kinases results in cell disorders causing cancers and other diseases. The present study deals with the identification of predominant features present in the inhibitors targeting these enzymes and classification of the kinase and non-kinase inhibitors using machine learning algorithms. The present work deals with two challenges. The first challenge is the classification of unbalanced data sets. Unbalanced data sets are the data sets in which there is an imbalance in the size of data sets that constitute these sets. The second challenge is the concept complexity (closely related minority and majority data sets in the feature space). Our approach deals with the binary classification of approved human inhibitors present in the Drug bank database into kinase and non-kinase inhibitors. Clustering of the inhibitors followed by classification using an ensemble consisting of several classification models is generated. Classification is done in two levels. Weighted voting is used after each level. Finally an overall accuracy of 80% is obtained after two levels of classification. Thus we established a new a type of approach for the classification of unbalanced data sets and the data sets in which there is an overlap between instances belonging to dierent classes. Finally we established a signature specific to kinase inhibitors.
引用
收藏
页码:516 / 521
页数:6
相关论文
共 50 条
  • [21] Data-based structural health monitoring using small training data sets
    Balsamo, Luciana
    Betti, Raimondo
    STRUCTURAL CONTROL & HEALTH MONITORING, 2015, 22 (10): : 1240 - 1264
  • [22] An Effective Over-sampling Method for Imbalanced Data Sets Classification
    Zhai Yun
    Ma Nan
    Ruan Da
    An Bing
    CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (03): : 489 - 494
  • [23] Research on Method Application of Transforming Fuzzy Sets Using SPA Sets
    Xie, Li
    Zhou, Wenbo
    Shi, Lei
    3RD INTERNATIONAL CONFERENCE ON APPLIED ENGINEERING, 2016, 51 : 637 - 642
  • [24] SVM classification for imbalanced data sets using a multiobjective optimization framework
    Askan, Aysegul
    Sayin, Serpil
    ANNALS OF OPERATIONS RESEARCH, 2014, 216 (01) : 191 - 203
  • [25] SVM classification for imbalanced data sets using a multiobjective optimization framework
    Ayşegül Aşkan
    Serpil Sayın
    Annals of Operations Research, 2014, 216 : 191 - 203
  • [26] Classification of Imbalanced data sets using Multi Objective Genetic Programming
    Maheta, Hardik H.
    Dabhi, Vipul K.
    2015 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2015,
  • [27] Classifying imbalanced data sets using similarity based hierarchical decomposition
    Beyan, Cigdem
    Fisher, Robert
    PATTERN RECOGNITION, 2015, 48 (05) : 1653 - 1672
  • [28] Stacked generalizations in imbalanced fraud data sets using resampling methods
    Kerwin, Kathleen R.
    Bastian, Nathaniel D.
    JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2021, 18 (03): : 175 - 192
  • [29] Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets
    M. Faisal Zaman
    Hideo Hirose
    New Generation Computing, 2011, 29 : 277 - 292
  • [30] Classification Performance of Bagging and Boosting Type Ensemble Methods with Small Training Sets
    Zaman, M. Faisal
    Hirose, Hideo
    NEW GENERATION COMPUTING, 2011, 29 (03) : 277 - 292