PTML Combinatorial Model of ChEMBL Compounds Assays for Multiple Types of Cancer

被引:45
作者
Bediaga, Harbil [1 ]
Arrasate, Sonia [1 ]
Gonzalez-Diaz, Humbert [1 ,2 ]
机构
[1] Univ Basque Country, Dept Organ Chem 2, UPV EHU, Leioa 48940, Spain
[2] Ikerbasque, Basque Fdn Sci, E-48011 Bilbao, Spain
关键词
ChEMBL; anticancer compounds; perturbation theory; machine learning; artificial neural networks; big data; multitarget models; IN-SILICO DISCOVERY; SEQUENCE AUTOCORRELATION VECTORS; MULTITARGET DRUG DISCOVERY; GENETIC NEURAL-NETWORKS; BOX-JENKINS OPERATORS; CONFORMATIONAL STABILITY; WEB SERVER; QSAR MODEL; SIMULTANEOUS PREDICTION; FUNCTIONAL DOMAIN;
D O I
10.1021/acscombsci.8b00090
中图分类号
O69 [应用化学];
学科分类号
081704 ;
摘要
Determining the target proteins of new anticancer compounds is a very important task in Medicinal Chemistry. In this sense, chemists carry out preclinical assays with a high number of combinations of experimental conditions (9). In fact, ChEMBL database contains outcomes of 65 534 different anticancer activity preclinical assays for 35 565 different chemical compounds (1.84 assays per compound). These assays cover different combinations of 9 formed from >70 different biological activity parameters (c(0)), >300 different drug targets (c(1)), >230 cell lines (c(2)), and 5 organisms of assay (c(3)) or organisms of the target (c(4)). It include a total of 45 833 assays in leukemia, 6227 assays in breast cancer, 2499 assays in ovarian cancer, 3499 in colon cancer, 3159 in lung cancer, 2750 in prostate cancer, 601 in melanoma, etc. This is a very complex data set with multiple Big Data features. This data is hard to be rationalized by researchers to extract useful relationships and predict new compounds. In this context, we propose to combine perturbation theory (PT) ideas and machine learning (ML) modeling to solve this combinatorial-like problem. In this work, we report a PTML (PT + ML) model for ChEMBL data set of preclinical assays of anticancer compounds. This is a simple linear model with only three variables. The model presented values of area under receiver operating curve = AUROC = 0.872, specificity = Sp(%) = 90.2, sensitivity = Sn(%) = 70.6, and overall accuracy = Ac(%) = 87.7 in training series. The model also have Sp(%) = 90.1, Sn(%) = 71.4, and Ac(%) = 87.8 in external validation series. The model use PT operators based on multicondition moving averages to capture all the complexity of the data set. We also compared the model with nonlinear artificial neural network (ANN) models obtaining similar results. This confirms the hypothesis of a linear relationship between the PT operators and the classification as anticancer compounds in different combinations of assay conditions. Last, we compared the model with other PTML models reported in the literature concluding that this is the only one PTML model able to predict activity against multiple types of cancer. This model is a simple but versatile tool for the prediction of the targets of anticancer compounds taking into consideration multiple combinations of experimental conditions in preclinical assays.
引用
收藏
页码:621 / 632
页数:12
相关论文
共 79 条
[1]   Alignment-Free Prediction of Polygalacturonases with Pseudofolding Topological Indices: Experimental Isolation from Coffea arabica and Prediction of a New Sequence [J].
Agueero-Chapin, Guillermin ;
Varona-Santos, Javier ;
de la Riva, Gustavo A. ;
Antunes, Agostinho ;
Gonzalez-Villa, Tomas ;
Uriarte, Eugenio ;
Gonzalez-Diaz, Humberto .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (04) :2122-2128
[2]   Model for High-Throughput Screening of Multitarget Drugs in Chemical Neurosciences: Synthesis, Assay, and Theoretic Study of Rasagiline Carbamates [J].
Alonso, Nerea ;
Caamano, Olga ;
Romero-Duran, Francisco J. ;
Luan, Feng ;
Cordeiro, M. Natalia D. S. ;
Yanez, Matilde ;
Gonzalez-Diaz, Humberto ;
Garcia-Mera, Xerardo .
ACS CHEMICAL NEUROSCIENCE, 2013, 4 (10) :1393-1403
[3]   A QSAR Study on Some Series of Anticancer Tyrosine Kinase Inhibitors [J].
Anwer, Zaihra ;
Gupta, Satya P. .
MEDICINAL CHEMISTRY, 2013, 9 (02) :203-212
[4]   coral Software: QSAR for Anticancer Agents [J].
Benfenati, Emilio ;
Toropov, Andrey A. ;
Toropova, Alla P. ;
Manganaro, Alberto ;
Diaza, Rodolfo Gonella .
CHEMICAL BIOLOGY & DRUG DESIGN, 2011, 77 (06) :471-476
[5]   Perturbation theory model of reactivity and enantioselectivity of palladium-catalyzed Heck-Heck cascade reactions [J].
Blazquez-Barbadillo, C. ;
Aranzamendi, E. ;
Coya, E. ;
Lete, E. ;
Sotomayor, N. ;
Gonzalez-Diaz, H. .
RSC ADVANCES, 2016, 6 (45) :38602-38610
[6]   Amino acid sequence autocorrelation vectors and ensembles of Bayesian-regularized genetic neural networks for prediction of conformational stability of human lysozyme mutants [J].
Caballero, Julio ;
Fernandez, Leyden ;
Abreu, Jose Ignacio ;
Fernandez, Michael .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (03) :1255-1268
[7]   Predicting enzyme subclass by functional domain composition and pseudo amino acid composition [J].
Cai, YD ;
Chou, KC .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (03) :967-971
[8]   Multi-output Model with Box-Jenkins Operators of Quadratic Indices for Prediction of Malaria and Cancer Inhibitors Targeting Ubiquitin-Proteasome Pathway (UPP) Proteins [J].
Casanola-Martin, Gerardo M. ;
Le-Thi-Thu, Huong ;
Perez-Gimenez, Facundo ;
Marrero-Ponce, Yovani ;
Merino-Sanjuan, Matilde ;
Abad, Concepcion ;
Gonzalez-Diaz, Humberto .
CURRENT PROTEIN & PEPTIDE SCIENCE, 2016, 17 (03) :220-227
[9]   Multi-output model with Box-Jenkins operators of linear indices to predict multi-target inhibitors of ubiquitin-proteasome pathway [J].
Casanola-Martin, Gerardo M. ;
Huong Le-Thi-Thu ;
Perez-Gimenez, Facundo ;
Marrero-Ponce, Yovani ;
Merino-Sanjuan, Matilde ;
Abad, Concepcion ;
Gonzalez-Diaz, Humberto .
MOLECULAR DIVERSITY, 2015, 19 (02) :347-356
[10]   Assessing the drug-likeness of lamellarins, a marine-derived natural product class with diverse oncological activities [J].
Chittchang, Montakarn ;
Gleeson, M. Paul ;
Poypradith, Poonsakdi ;
Ruchirawat, Somsak .
EUROPEAN JOURNAL OF MEDICINAL CHEMISTRY, 2010, 45 (06) :2165-2172