An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases

被引:0
|
作者
Rani, T. Sobha [1 ]
Soujanya, P. V. [2 ]
机构
[1] Univ Hyderabad, Sch Comp & Informat Sci, Computat Intelligence Lab, Hyderabad 500134, Andhra Pradesh, India
[2] Tata Consultancy Serv, Dept Syst Engn, Bangalore, Karnataka, India
来源
2013 SIXTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3) | 2013年
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Nearly all aspects of cell life and death are controlled by the phosphorylation of proteins which are catalyzed by kinases. Malfunctioning of kinases results in cell disorders causing cancers and other diseases. The present study deals with the identification of predominant features present in the inhibitors targeting these enzymes and classification of the kinase and non-kinase inhibitors using machine learning algorithms. The present work deals with two challenges. The first challenge is the classification of unbalanced data sets. Unbalanced data sets are the data sets in which there is an imbalance in the size of data sets that constitute these sets. The second challenge is the concept complexity (closely related minority and majority data sets in the feature space). Our approach deals with the binary classification of approved human inhibitors present in the Drug bank database into kinase and non-kinase inhibitors. Clustering of the inhibitors followed by classification using an ensemble consisting of several classification models is generated. Classification is done in two levels. Weighted voting is used after each level. Finally an overall accuracy of 80% is obtained after two levels of classification. Thus we established a new a type of approach for the classification of unbalanced data sets and the data sets in which there is an overlap between instances belonging to dierent classes. Finally we established a signature specific to kinase inhibitors.
引用
收藏
页码:516 / 521
页数:6
相关论文
共 50 条
  • [41] Using hybrid associative classifier with translation (HACT) for studying imbalanced data sets
    Cleofas Sanchez, Laura
    Guzman Escobedo, M.
    Valdovinos Rosas, Rosa Maria
    Yanez Marquez, Cornelio
    Camacho Nieto, Oscar
    INGENIERIA E INVESTIGACION, 2012, 32 (01): : 53 - 57
  • [42] A solution for imbalanced training sets problem by CombNET-II and its application on fog forecasting
    Nugroho, Anto Satriyo
    Kuroyanagi, Susumu
    Iwata, Akira
    2002, Institute of Electronics, Information and Communication, Engineers, IEICE (E85-D)
  • [43] Application of Parallel Distributed Genetics-based Machine Learning to Imbalanced Data Sets
    Nojima, Yusuke
    Mihara, Shingo
    Ishibuchi, Hisao
    2012 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2012,
  • [44] A solution for imbalanced training sets problem by CombNET-II and its application on fog forecasting
    Nugroho, AS
    Kuroyanagi, S
    Iwata, A
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2002, E85D (07): : 1165 - 1174
  • [45] A Novel Weighted Ensemble Method to Overcome the Impact of Under-fitting and Over-fitting on the Classification Accuracy of the Imbalanced Data Sets
    Fatima, Ghulam
    Saeed, Sana
    PAKISTAN JOURNAL OF STATISTICS AND OPERATION RESEARCH, 2021, 17 (02) : 483 - 496
  • [46] A Rasterized Lightning Disaster Risk Method for Imbalanced Sets Using Neural Network
    Zhang, Yan
    Han, Jin
    Yuan, Chengsheng
    Yang, Shuo
    Li, Chuanlong
    Sun, Xingming
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (01): : 563 - 574
  • [47] Classification using small fuzzy biological data sets
    Diederich, J
    Fortuner, R
    1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, 1998, : 1429 - 1434
  • [48] Mining extremely small data sets with application to software reuse
    Jiang, Yuan
    Li, Ming
    Zhou, Zhi-Hua
    SOFTWARE-PRACTICE & EXPERIENCE, 2009, 39 (04): : 423 - 440
  • [49] A Hybrid Re-sampling Method for SVM Learning from Imbalanced Data Sets
    Li, Peng
    Qiao, Pei-Li
    Liu, Yuan-Chao
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 65 - 69
  • [50] (1 + Ε)-class classification: An anomaly detection method for highly imbalanced or incomplete data sets
    Laboratory of Methods for Big Data Analysis National Research University, Higher School of Economics, 20 Myasnitskaya ulitsa, Moscow
    101000, Russia
    J. Mach. Learn. Res., 2020,