Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction

被引:0
|
作者
Matthew C. Robinson
Robert C. Glen
Alpha A. Lee
机构
[1] Department of Physics,The Centre for Molecular Informatics, Department of Chemistry
[2] University of Cambridge,Computational and Systems Medicine, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine
[3] Imperial College,undefined
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision–recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.
引用
收藏
页码:717 / 730
页数:13
相关论文
共 50 条
  • [41] The three pillars of large-scale deep learning
    Hoefler, Torsten
    2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 908 - 908
  • [42] Large-scale Pollen Recognition with Deep Learning
    de Geus, Andre R.
    Barcelos, Celia A. Z.
    Batista, Marcos A.
    da Silva, Sergio F.
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [43] Learning Deep Representation with Large-scale Attributes
    Ouyang, Wanli
    Li, Hongyang
    Zeng, Xingyu
    Wang, Xiaogang
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1895 - 1903
  • [44] Deep Learning on Large-scale Muticore Clusters
    Sakiyama, Kazumasa
    Kato, Shinpei
    Ishikawa, Yutaka
    Hori, Atsushi
    Monrroy, Abraham
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 314 - 321
  • [45] Ensemble Learning for Large-Scale Workload Prediction
    Singh, Nidhi
    Rao, Shrisha
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (02) : 149 - 165
  • [46] Efficient Distributed Machine Learning for Large-scale Models by Reducing Redundant Communication
    Yokoyama, Harumichi
    Araki, Takuya
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [47] Machine Learning Models for GPU Error Prediction in a Large Scale HPC System
    Nie, Bin
    Xue, Ji
    Gupta, Saurabh
    Patel, Tirthak
    Engelmann, Christian
    Smirni, Evgenia
    Tiwari, Devesh
    2018 48TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2018, : 95 - 106
  • [48] Reproducible learning in large-scale graphical models
    Zhou, Jia
    Li, Yang
    Zheng, Zemin
    Li, Daoji
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 189
  • [49] SKETCHING FOR LARGE-SCALE LEARNING OF MIXTURE MODELS
    Keriven, Nicolas
    Bourrier, Anthony
    Gribonval, Remi
    Perez, Patrick
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6190 - 6194
  • [50] Sketching for large-scale learning of mixture models
    Keriven, Nicolas
    Bourrier, Anthony
    Gribonval, Remi
    Perez, Patrick
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2018, 7 (03) : 447 - 508