Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction

被引:0
|
作者
Matthew C. Robinson
Robert C. Glen
Alpha A. Lee
机构
[1] Department of Physics,The Centre for Molecular Informatics, Department of Chemistry
[2] University of Cambridge,Computational and Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine
[3] Imperial College,undefined
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision–recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.
引用
收藏
页码:717 / 730
页数:13
相关论文
共 50 条
  • [21] Large-scale Deep Learning at Baidu
    Yu, Kai
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 2211 - 2211
  • [22] A Machine-Learning Approach for Communication Prediction of Large-Scale Applications
    Papadopoulou, Nikela
    Goumas, Georgios
    Koziris, Nectarios
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 120 - 123
  • [23] Using Deep Learning and Machine Learning Methods to Diagnose Hailstorms in Large-Scale Thermodynamic Environments
    Pulukool, Farha
    Li, Longzhuang
    Liu, Chuntao
    SUSTAINABILITY, 2020, 12 (24) : 1 - 13
  • [24] AN OPERATIONAL APPROACH TO LARGE-SCALE CROP YIELD PREDICTION WITH SPATIO-TEMPORAL MACHINE LEARNING MODELS
    Helber, Patrick
    Bischke, Benjamin
    Packbier, Carolin
    Habelitz, Peter
    Seefeldt, Florian
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 4299 - 4302
  • [25] Review of machine learning and deep learning models for toxicity prediction
    Guo, Wenjing
    Liu, Jie
    Dong, Fan
    Song, Meng
    Li, Zoe
    Khan, Md Kamrul Hasan
    Patterson, Tucker A.
    Hong, Huixiao
    EXPERIMENTAL BIOLOGY AND MEDICINE, 2023, 248 (21) : 1952 - 1973
  • [26] Large Scale Machine Learning for Response Prediction
    Long, Bo
    2nd Workshop on Parallel Programming for Analytics Applications (PPAA 2015), 2015, : 2 - 2
  • [27] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
  • [28] Large-scale kernel extreme learning machine
    Deng, Wan-Yu
    Zheng, Qing-Hua
    Chen, Lin
    Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (11): : 2235 - 2246
  • [29] Machine learning for large-scale MOF screening
    Coupry, Damien
    Groot, Laurens
    Addicoat, Matthew
    Heine, Thomas
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [30] Robust Large-Scale Machine Learning in the Cloud
    Rendle, Steffen
    Fetterly, Dennis
    Shekita, Eugene J.
    Su, Bor-yiing
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1125 - 1134