Assessing model fit by cross-validation

被引:637
|
作者
Hawkins, DM [1 ]
Basak, SC
Mills, D
机构
[1] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Nat Resources Res Inst, Duluth, MN 55811 USA
关键词
D O I
10.1021/ci025626i
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
When QSAR models are fitted, it is important to validate any fitted model-to check that it is plausible that its predictions will carry over to fresh data not used in the model fitting exercise. There are two standard ways of doing this-using a separate hold-out test sample and the computationally much more burdensome leave-one-out cross-validation in which the entire pool of available compounds is used both to fit the model and to assess its validity. We show by theoretical argument and empiric study of a large QSAR data set that when the available sample size is small-in the dozens or scores rather than the hundreds, holding a portion of it back for testing is wasteful, and that it is much better to use cross-validation, but ensure that this is done properly.
引用
收藏
页码:579 / 586
页数:8
相关论文
共 50 条
  • [31] Cross-validation methods
    Browne, MW
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2000, 44 (01) : 108 - 132
  • [32] Purposeful cross-validation: a novel cross-validation strategy for improved surrogate optimizability
    Correia, Daniel
    Wilke, Daniel N.
    ENGINEERING OPTIMIZATION, 2021, 53 (09) : 1558 - 1573
  • [33] Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction
    Gianola, Daniel
    Schoen, Chris-Carolin
    G3-GENES GENOMES GENETICS, 2016, 6 (10): : 3107 - 3128
  • [34] Cross-validation pitfalls when selecting and assessing regression and classification models
    Krstajic, Damjan
    Buturovic, Ljubomir J.
    Leahy, David E.
    Thomas, Simon
    JOURNAL OF CHEMINFORMATICS, 2014, 6
  • [35] Cross-validation pitfalls when selecting and assessing regression and classification models
    Damjan Krstajic
    Ljubomir J Buturovic
    David E Leahy
    Simon Thomas
    Journal of Cheminformatics, 6
  • [36] An unbiased model comparison test using cross-validation
    Desmarais, Bruce A.
    Harden, Jeffrey J.
    QUALITY & QUANTITY, 2014, 48 (04) : 2155 - 2173
  • [37] Markov cross-validation for time series model evaluations
    Jiang, Gaoxia
    Wang, Wenjian
    INFORMATION SCIENCES, 2017, 375 : 219 - 233
  • [39] Bootstrap Cross-Validation Improves Model Selection in Pharmacometrics
    Cavenaugh, James Stephens
    STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2022, 14 (02): : 168 - 203
  • [40] Approximate Cross-validation: Guarantees for Model Assessment and Selection
    Wilson, Ashia
    Kasy, Maximilian
    Mackey, Lester
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108