Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction

被引:0
|
作者
Matthew C. Robinson
Robert C. Glen
Alpha A. Lee
机构
[1] Department of Physics,The Centre for Molecular Informatics, Department of Chemistry
[2] University of Cambridge,Computational and Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine
[3] Imperial College,undefined
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision–recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.
引用
收藏
页码:717 / 730
页数:13
相关论文
共 50 条
  • [1] Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction
    Robinson, Matthew C.
    Glen, Robert C.
    Lee, Alpha A.
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2020, 34 (07) : 717 - 730
  • [2] A Comparison of Svm With Deep Learning Models for Large-Scale Intents Analysis
    Islamic, Toqeer Ali
    Jan, Salman
    Faizullah, Safiullah
    Musa, Shahrulniza
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (07): : 38 - 46
  • [3] Large-Scale Machine Learning for Business Sector Prediction
    Angenent, Mitch N.
    Barata, Antonio Pereira
    Takes, Frank W.
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1143 - 1146
  • [4] Hybrid deep learning models for traffic prediction in large-scale road networks
    Zheng, Ge
    Chai, Wei Koong
    Duanmu, Jing-Lin
    Katos, Vasilis
    INFORMATION FUSION, 2023, 92 : 93 - 114
  • [5] Large-scale comparison of machine learning algorithms for target prediction of natural products
    Liang, Lu
    Liu, Ye
    Kang, Bo
    Wang, Ru
    Sun, Meng-Yu
    Wu, Qi
    Meng, Xiang-Fei
    Lin, Jian-Ping
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [6] Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
    Jiangxia Wu
    Yihao Chen
    Jingxing Wu
    Duancheng Zhao
    Jindi Huang
    MuJie Lin
    Ling Wang
    Journal of Cheminformatics, 16
  • [7] Large-scale comparison of machine learning methods for drug target prediction on ChEMBL
    Mayr, Andreas
    Klambauer, Guenter
    Unterthiner, Thomas
    Steijaert, Marvin
    Wegner, Jorg K.
    Ceulemans, Hugo
    Clevert, Djork-Arne
    Hochreiter, Sepp
    CHEMICAL SCIENCE, 2018, 9 (24) : 5441 - 5451
  • [8] Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
    Wu, Jiangxia
    Chen, Yihao
    Wu, Jingxing
    Zhao, Duancheng
    Huang, Jindi
    Lin, Mujie
    Wang, Ling
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [9] On Efficient Training of Large-Scale Deep Learning Models
    Shen, Li
    Sun, Yan
    Yu, Zhiyuan
    Ding, Liang
    Tian, Xinmei
    Tao, Dacheng
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [10] Aggregation models in ensemble learning: A large-scale comparison
    Campagner, Andrea
    Ciucci, Davide
    Cabitza, Federico
    INFORMATION FUSION, 2023, 90 : 241 - 252