QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem

被引:93
|
作者
Zakharov, Alexey V. [1 ]
Peach, Megan L. [2 ]
Sitzmann, Markus [1 ]
Nicklaus, Marc C. [1 ]
机构
[1] NCI, CADD Grp, Biol Chem Lab, Ctr Canc Res,NIH,DHHS,NCI Frederick, 376 Boyles St, Frederick, MD 21702 USA
[2] Frederick Natl Lab Canc Res, Basic Sci Program, Leidos Biomed Inc, Comp Aided Drug Design Grp,Chem Biol Lab, Frederick, MD 21702 USA
基金
美国国家卫生研究院;
关键词
PIPELINE PILOT; RANDOM FOREST; PREDICTION;
D O I
10.1021/ci400737s
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Many of the structures in Pub Chem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced Pub Chem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and "biological" descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services (http://cactus.nci.nih.gov/chemical/apps/cap).
引用
收藏
页码:705 / 712
页数:8
相关论文
共 50 条
  • [1] A novel method for mining highly imbalanced high-throughput screening data in PubChem
    Li, Qingliang
    Wang, Yanli
    Bryant, Stephen H.
    BIOINFORMATICS, 2009, 25 (24) : 3310 - 3316
  • [2] A Novel Automated Framework for QSAR Modeling of Highly Imbalanced Leishmania High-Throughput Screening Data
    Casanova-Alvarez, Omar
    Morales-Helguera, Aliuska
    Angel Cabrera-Perez, Miguel
    Molina-Ruiz, Reinaldo
    Molina, Christophe
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (07) : 3213 - 3231
  • [3] DPubChem: a web tool for QSAR modeling and high-throughput virtual screening
    Soufan, Othman
    Ba-alawi, Wail
    Magana-Mora, Arturo
    Essack, Magbubah
    Bajic, Vladimir B.
    SCIENTIFIC REPORTS, 2018, 8
  • [4] DPubChem: a web tool for QSAR modeling and high-throughput virtual screening
    Othman Soufan
    Wail Ba-alawi
    Arturo Magana-Mora
    Magbubah Essack
    Vladimir B. Bajic
    Scientific Reports, 8
  • [5] PubChem BioAssay: A Decade's Development toward Open High-Throughput Screening Data Sharing
    Wang, Yanli
    Cheng, Tiejun
    Bryant, Stephen H.
    SLAS DISCOVERY, 2017, 22 (06) : 655 - 666
  • [6] Data flow modeling, data mining and QSAR in high-throughput discovery of functional nanomaterials
    Yang, Yang
    Lin, Tian
    Weng, Xiao L.
    Darr, Jawwad A.
    Wang, Xue Z.
    COMPUTERS & CHEMICAL ENGINEERING, 2011, 35 (04) : 671 - 678
  • [7] High-throughput QSAR
    Rouzer, Carol A.
    CHEMICAL RESEARCH IN TOXICOLOGY, 2008, 21 (03) : 561 - 562
  • [8] Dose-Response Modeling of High-Throughput Screening Data
    Parham, Fred
    Austin, Chris
    Southall, Noel
    Huang, Ruili
    Tice, Raymond
    Portier, Christopher
    JOURNAL OF BIOMOLECULAR SCREENING, 2009, 14 (10) : 1216 - 1227
  • [9] Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
    Butkiewicz, Mariusz
    Lowe, Edward W., Jr.
    Mueller, Ralf
    Mendenhall, Jeffrey L.
    Teixeira, Pedro L.
    Weaver, C. David
    Meiler, Jens
    MOLECULES, 2013, 18 (01): : 735 - 756
  • [10] Modeling and control of high-throughput screening systems
    Brunsch, T.
    Raisch, J.
    Hardouin, L.
    CONTROL ENGINEERING PRACTICE, 2012, 20 (01) : 14 - 23