An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases

被引:0
|
作者
Rani, T. Sobha [1 ]
Soujanya, P. V. [2 ]
机构
[1] Univ Hyderabad, Sch Comp & Informat Sci, Computat Intelligence Lab, Hyderabad 500134, Andhra Pradesh, India
[2] Tata Consultancy Serv, Dept Syst Engn, Bangalore, Karnataka, India
来源
2013 SIXTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3) | 2013年
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Nearly all aspects of cell life and death are controlled by the phosphorylation of proteins which are catalyzed by kinases. Malfunctioning of kinases results in cell disorders causing cancers and other diseases. The present study deals with the identification of predominant features present in the inhibitors targeting these enzymes and classification of the kinase and non-kinase inhibitors using machine learning algorithms. The present work deals with two challenges. The first challenge is the classification of unbalanced data sets. Unbalanced data sets are the data sets in which there is an imbalance in the size of data sets that constitute these sets. The second challenge is the concept complexity (closely related minority and majority data sets in the feature space). Our approach deals with the binary classification of approved human inhibitors present in the Drug bank database into kinase and non-kinase inhibitors. Clustering of the inhibitors followed by classification using an ensemble consisting of several classification models is generated. Classification is done in two levels. Weighted voting is used after each level. Finally an overall accuracy of 80% is obtained after two levels of classification. Thus we established a new a type of approach for the classification of unbalanced data sets and the data sets in which there is an overlap between instances belonging to dierent classes. Finally we established a signature specific to kinase inhibitors.
引用
收藏
页码:516 / 521
页数:6
相关论文
共 50 条
  • [31] A novel anomaly detection using small training sets
    Yin, QB
    Shen, LR
    Zhang, RB
    Li, XY
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 258 - 263
  • [32] CSIML: a cost-sensitive and iterative machine-learning method for small and imbalanced materials data sets
    Li, Shengzhou
    Nakata, Ayako
    CHEMISTRY LETTERS, 2024, 53 (05)
  • [33] Ensemble of surrogates and cross-validation for rapid and accurate predictions using small data sets
    Alizadeh, Reza
    Jia, Liangyue
    Nellippallil, Anand Balu
    Wang, Guoxin
    Hao, Jia
    Allen, Janet K.
    Mistree, Farrokh
    AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2019, 33 (04): : 484 - 501
  • [34] HYBS: A novel hybrid sampling method for learning from imbalanced data sets
    Liu, Zhiyong
    Yu, Hualong
    International Journal of Advancements in Computing Technology, 2012, 4 (10) : 281 - 288
  • [35] Application of a widely used denitrification model to Dutch data sets
    Heinen, Marius
    GEODERMA, 2006, 133 (3-4) : 464 - 473
  • [36] Clustering boundary over-sampling classification method for imbalanced data sets
    Lou, Xiao-Jun
    Sun, Yu-Xuan
    Liu, Hai-Tao
    Liu, H.-T. (liuhaitao@wsn.cn), 1600, Zhejiang University (47): : 944 - 950
  • [37] Fuzzy-Pattern-Classifier Training with Small Data Sets
    Moenks, Uwe
    Petker, Denis
    Lohweg, Volker
    INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: THEORY AND METHODS, PT 1, 2010, 80 : 426 - +
  • [38] A Learning Method For Small Data Sets With Multimodality Variables
    Li, Der-Chiang
    Chang, Yu-Ching
    Su, Mei-Lan
    Lin, Liang-Sian
    PROCEEDINGS OF 2013 IEEE INTERNATIONAL CONFERENCE ON GREY SYSTEMS AND INTELLIGENT SERVICES (GSIS), 2013, : 481 - 483
  • [39] GEN, A COMPUTERIZED STATISTICAL PROCEDURE FOR CREATING LARGE DATA SETS FROM SMALL DATA SETS FOR TRAINING DISCRIMINANT FUNCTIONS
    LATHROP, LD
    PENNYPACKER, SP
    PHYTOPATHOLOGY, 1979, 69 (09) : 1036 - 1036
  • [40] Ensemble learning by means of a multi-objective optimization design approach for dealing with imbalanced data sets
    Ribeiro, Victor Henrique Alves
    Reynoso-Meza, Gilberto
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 147