An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases

被引：0

作者：

Rani, T. Sobha ^{[1
]}

Soujanya, P. V. ^{[2
]}

机构：

[1] Univ Hyderabad, Sch Comp & Informat Sci, Computat Intelligence Lab, Hyderabad 500134, Andhra Pradesh, India

[2] Tata Consultancy Serv, Dept Syst Engn, Bangalore, Karnataka, India

来源：

2013 SIXTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3) | 2013年

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Nearly all aspects of cell life and death are controlled by the phosphorylation of proteins which are catalyzed by kinases. Malfunctioning of kinases results in cell disorders causing cancers and other diseases. The present study deals with the identification of predominant features present in the inhibitors targeting these enzymes and classification of the kinase and non-kinase inhibitors using machine learning algorithms. The present work deals with two challenges. The first challenge is the classification of unbalanced data sets. Unbalanced data sets are the data sets in which there is an imbalance in the size of data sets that constitute these sets. The second challenge is the concept complexity (closely related minority and majority data sets in the feature space). Our approach deals with the binary classification of approved human inhibitors present in the Drug bank database into kinase and non-kinase inhibitors. Clustering of the inhibitors followed by classification using an ensemble consisting of several classification models is generated. Classification is done in two levels. Weighted voting is used after each level. Finally an overall accuracy of 80% is obtained after two levels of classification. Thus we established a new a type of approach for the classification of unbalanced data sets and the data sets in which there is an overlap between instances belonging to dierent classes. Finally we established a signature specific to kinase inhibitors.

引用

页码：516 / 521

页数：6

共 50 条

[31] A novel anomaly detection using small training sets
Yin, QB
Shen, LR
Zhang, RB
Li, XY
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 258 - 263
[32] CSIML: a cost-sensitive and iterative machine-learning method for small and imbalanced materials data sets
Li, Shengzhou
Nakata, Ayako
CHEMISTRY LETTERS, 2024, 53 (05)
[33] Ensemble of surrogates and cross-validation for rapid and accurate predictions using small data sets
Alizadeh, Reza
Jia, Liangyue
Nellippallil, Anand Balu
Wang, Guoxin
Hao, Jia
Allen, Janet K.
Mistree, Farrokh
AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2019, 33 (04): : 484 - 501
[34] HYBS: A novel hybrid sampling method for learning from imbalanced data sets
Liu, Zhiyong
Yu, Hualong
International Journal of Advancements in Computing Technology, 2012, 4 (10) : 281 - 288
[35] Application of a widely used denitrification model to Dutch data sets
Heinen, Marius
GEODERMA, 2006, 133 (3-4) : 464 - 473
[36] Clustering boundary over-sampling classification method for imbalanced data sets
Lou, Xiao-Jun
Sun, Yu-Xuan
Liu, Hai-Tao
Liu, H.-T. (liuhaitao@wsn.cn), 1600, Zhejiang University (47): : 944 - 950
[37] Fuzzy-Pattern-Classifier Training with Small Data Sets
Moenks, Uwe
Petker, Denis
Lohweg, Volker
INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: THEORY AND METHODS, PT 1, 2010, 80 : 426 - +
[38] A Learning Method For Small Data Sets With Multimodality Variables
Li, Der-Chiang
Chang, Yu-Ching
Su, Mei-Lan
Lin, Liang-Sian
PROCEEDINGS OF 2013 IEEE INTERNATIONAL CONFERENCE ON GREY SYSTEMS AND INTELLIGENT SERVICES (GSIS), 2013, : 481 - 483
[39] GEN, A COMPUTERIZED STATISTICAL PROCEDURE FOR CREATING LARGE DATA SETS FROM SMALL DATA SETS FOR TRAINING DISCRIMINANT FUNCTIONS
LATHROP, LD
PENNYPACKER, SP
PHYTOPATHOLOGY, 1979, 69 (09) : 1036 - 1036
[40] Ensemble learning by means of a multi-objective optimization design approach for dealing with imbalanced data sets
Ribeiro, Victor Henrique Alves
Reynoso-Meza, Gilberto
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 147

← 1 2 3 4 5 →