An ensemble method using small training sets for imbalanced data sets: Application to drugs used for kinases

被引：0

作者：

Rani, T. Sobha ^{[1
]}

Soujanya, P. V. ^{[2
]}

机构：

[1] Univ Hyderabad, Sch Comp & Informat Sci, Computat Intelligence Lab, Hyderabad 500134, Andhra Pradesh, India

[2] Tata Consultancy Serv, Dept Syst Engn, Bangalore, Karnataka, India

来源：

2013 SIXTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3) | 2013年

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Nearly all aspects of cell life and death are controlled by the phosphorylation of proteins which are catalyzed by kinases. Malfunctioning of kinases results in cell disorders causing cancers and other diseases. The present study deals with the identification of predominant features present in the inhibitors targeting these enzymes and classification of the kinase and non-kinase inhibitors using machine learning algorithms. The present work deals with two challenges. The first challenge is the classification of unbalanced data sets. Unbalanced data sets are the data sets in which there is an imbalance in the size of data sets that constitute these sets. The second challenge is the concept complexity (closely related minority and majority data sets in the feature space). Our approach deals with the binary classification of approved human inhibitors present in the Drug bank database into kinase and non-kinase inhibitors. Clustering of the inhibitors followed by classification using an ensemble consisting of several classification models is generated. Classification is done in two levels. Weighted voting is used after each level. Finally an overall accuracy of 80% is obtained after two levels of classification. Thus we established a new a type of approach for the classification of unbalanced data sets and the data sets in which there is an overlap between instances belonging to dierent classes. Finally we established a signature specific to kinase inhibitors.

引用

页码：516 / 521

页数：6

共 50 条

[41] Using hybrid associative classifier with translation (HACT) for studying imbalanced data sets
Cleofas Sanchez, Laura
Guzman Escobedo, M.
Valdovinos Rosas, Rosa Maria
Yanez Marquez, Cornelio
Camacho Nieto, Oscar
INGENIERIA E INVESTIGACION, 2012, 32 (01): : 53 - 57
[42] A solution for imbalanced training sets problem by CombNET-II and its application on fog forecasting
Nugroho, Anto Satriyo
Kuroyanagi, Susumu
Iwata, Akira
2002, Institute of Electronics, Information and Communication, Engineers, IEICE (E85-D)
[43] Application of Parallel Distributed Genetics-based Machine Learning to Imbalanced Data Sets
Nojima, Yusuke
Mihara, Shingo
Ishibuchi, Hisao
2012 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2012,
[44] A solution for imbalanced training sets problem by CombNET-II and its application on fog forecasting
Nugroho, AS
Kuroyanagi, S
Iwata, A
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2002, E85D (07): : 1165 - 1174
[45] A Novel Weighted Ensemble Method to Overcome the Impact of Under-fitting and Over-fitting on the Classification Accuracy of the Imbalanced Data Sets
Fatima, Ghulam
Saeed, Sana
PAKISTAN JOURNAL OF STATISTICS AND OPERATION RESEARCH, 2021, 17 (02) : 483 - 496
[46] A Rasterized Lightning Disaster Risk Method for Imbalanced Sets Using Neural Network
Zhang, Yan
Han, Jin
Yuan, Chengsheng
Yang, Shuo
Li, Chuanlong
Sun, Xingming
CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (01): : 563 - 574
[47] Classification using small fuzzy biological data sets
Diederich, J
Fortuner, R
1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, 1998, : 1429 - 1434
[48] Mining extremely small data sets with application to software reuse
Jiang, Yuan
Li, Ming
Zhou, Zhi-Hua
SOFTWARE-PRACTICE & EXPERIENCE, 2009, 39 (04): : 423 - 440
[49] A Hybrid Re-sampling Method for SVM Learning from Imbalanced Data Sets
Li, Peng
Qiao, Pei-Li
Liu, Yuan-Chao
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 65 - 69
[50] (1 + Ε)-class classification: An anomaly detection method for highly imbalanced or incomplete data sets
Laboratory of Methods for Big Data Analysis National Research University, Higher School of Economics, 20 Myasnitskaya ulitsa, Moscow
101000, Russia
J. Mach. Learn. Res., 2020,

← 1 2 3 4 5 →