An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases

被引:0
|
作者
Rani, T. Sobha [1 ]
Soujanya, P. V. [2 ]
机构
[1] Univ Hyderabad, Sch Comp & Informat Sci, Computat Intelligence Lab, Hyderabad 500134, Andhra Pradesh, India
[2] Tata Consultancy Serv, Dept Syst Engn, Bangalore, Karnataka, India
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Nearly all aspects of cell life and death are controlled by the phosphorylation of proteins which are catalyzed by kinases. Malfunctioning of kinases results in cell disorders causing cancers and other diseases. The present study deals with the identification of predominant features present in the inhibitors targeting these enzymes and classification of the kinase and non-kinase inhibitors using machine learning algorithms. The present work deals with two challenges. The first challenge is the classification of unbalanced data sets. Unbalanced data sets are the data sets in which there is an imbalance in the size of data sets that constitute these sets. The second challenge is the concept complexity (closely related minority and majority data sets in the feature space). Our approach deals with the binary classification of approved human inhibitors present in the Drug bank database into kinase and non-kinase inhibitors. Clustering of the inhibitors followed by classification using an ensemble consisting of several classification models is generated. Classification is done in two levels. Weighted voting is used after each level. Finally an overall accuracy of 80% is obtained after two levels of classification. Thus we established a new a type of approach for the classification of unbalanced data sets and the data sets in which there is an overlap between instances belonging to dierent classes. Finally we established a signature specific to kinase inhibitors.
引用
收藏
页码:516 / 521
页数:6
相关论文
共 50 条
  • [1] Diversity Analysis on Imbalanced Data Sets by Using Ensemble Models
    Wang, Shuo
    Yao, Xin
    2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, : 324 - 331
  • [2] Editing Training Sets from Imbalanced Data Using Fuzzy-Rough Sets
    Nguyen, Do Van
    Ogawa, Keisuke
    Matsumoto, Kazunori
    Hashimoto, Masayuki
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, 2015, 458 : 115 - 129
  • [3] A LEARNING METHOD FOR IMBALANCED DATA SETS
    de la Calleja, Jorge
    Fuentes, Olac
    Gonzalez, Jesus
    Aceves-Perez, Rita M.
    KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2009, : 307 - +
  • [4] Hybrid kernel machine ensemble for imbalanced data sets
    Li, Peng
    Chan, Kap Luk
    Fang, Wen
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 1108 - +
  • [5] An improved P-SVM method used to deal with imbalanced data sets
    Chen Li
    Chen Jing
    Gao Xin-tao
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 118 - +
  • [6] Training Deep Neural Networks on Imbalanced Data Sets
    Wang, Shoujin
    Liu, Wei
    Wu, Jia
    Cao, Longbing
    Meng, Qinxue
    Kennedy, Paul J.
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 4368 - 4374
  • [7] Ensemble based Classification using Small Training sets : A Novel Approach
    Veni, C. V. Krishna
    Rani, T. Sobha
    2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ENSEMBLE LEARNING (CIEL), 2014, : 13 - 20
  • [8] An Adaptive Sampling Ensemble Classifier for Learning from Imbalanced Data Sets
    Geiler, Ordonez Jon
    Hong, Li
    Yue-Jian, Guo
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 513 - 517
  • [9] HTSS: a hyper-heuristic training set selection method for imbalanced data sets
    Bahareh Nikpour
    Hossein Nezamabadi-pour
    Iran Journal of Computer Science, 2018, 1 (2) : 109 - 128
  • [10] A memetic approach for training set selection in imbalanced data sets
    Nikpour, Bahareh
    Nezamabadi-pour, Hossein
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3043 - 3070