Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction

被引:0
|
作者
Matthew C. Robinson
Robert C. Glen
Alpha A. Lee
机构
[1] Department of Physics,The Centre for Molecular Informatics, Department of Chemistry
[2] University of Cambridge,Computational and Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine
[3] Imperial College,undefined
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision–recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.
引用
收藏
页码:717 / 730
页数:13
相关论文
共 50 条
  • [31] Large-scale Machine Learning over Graphs
    Yang, Yiming
    PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9
  • [32] Large-Scale Machine Learning and Neuroimaging in Psychiatry
    Thompson, Paul
    BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S51 - S51
  • [33] Coding for Large-Scale Distributed Machine Learning
    Xiao, Ming
    Skoglund, Mikael
    ENTROPY, 2022, 24 (09)
  • [34] Resource Elasticity for Large-Scale Machine Learning
    Huang, Botong
    Boehm, Matthias
    Tian, Yuanyuan
    Reinwald, Berthold
    Tatikonda, Shirish
    Reiss, Frederick R.
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 137 - 152
  • [35] TensorFlow: A system for large-scale machine learning
    Abadi, Martin
    Barham, Paul
    Chen, Jianmin
    Chen, Zhifeng
    Davis, Andy
    Dean, Jeffrey
    Devin, Matthieu
    Ghemawat, Sanjay
    Irving, Geoffrey
    Isard, Michael
    Kudlur, Manjunath
    Levenberg, Josh
    Monga, Rajat
    Moore, Sherry
    Murray, Derek G.
    Steiner, Benoit
    Tucker, Paul
    Vasudevan, Vijay
    Warden, Pete
    Wicke, Martin
    Yu, Yuan
    Zheng, Xiaoqiang
    PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2016, : 265 - 283
  • [36] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [37] Explainable artificial intelligence for stroke prediction through comparison of deep learning and machine learning models
    Moulaei, Khadijeh
    Afshari, Lida
    Moulaei, Reza
    Sabet, Babak
    Mousavi, Seyed Mohammad
    Afrash, Mohammad Reza
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [38] Large-scale transport simulation by deep learning
    Jie Pan
    Nature Computational Science, 2021, 1 : 306 - 306
  • [39] Tractable large-scale deep reinforcement learning
    Sarang, Nima
    Poullis, Charalambos
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 232
  • [40] Large-scale transport simulation by deep learning
    Pan, Jie
    NATURE COMPUTATIONAL SCIENCE, 2021, 1 (05): : 306 - 306