Calibrating random forests for probability estimation

被引:37
|
作者
Dankowski, Theresa [1 ]
Ziegler, Andreas [1 ,2 ,3 ,4 ]
机构
[1] Univ Lubeck, Univ Klinikum Schleswig Holstein, Inst Med Biometrie & Stat, Campus Lubeck, Lubeck, Germany
[2] Univ Lubeck, Zentrum Klin Studien, Lubeck, Germany
[3] DZHK German Ctr Cardiovasc Res, Hamburg Kiel Lubeck Partner Site, Lubeck, Germany
[4] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, Pietermaritzburg, South Africa
关键词
calibration; logistic regression; probability estimation; probability machine; random forests; updating; PREDICTION; VALIDATION; SCORE;
D O I
10.1002/sim.6959
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. (C) 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
引用
收藏
页码:3949 / 3960
页数:12
相关论文
共 50 条
  • [1] Calibrating Random Forests
    Bostrom, Henrik
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 121 - 126
  • [2] Using Random Forests for Consistent Probability Estimation in Whole Genome Association Studies
    Kruppa, Jochen
    Konig, Inke R.
    Ziegler, Andreas
    GENETIC EPIDEMIOLOGY, 2012, 36 (07) : 752 - 752
  • [3] Using Random Forests for consistent probability estimation in whole genome association studies
    Kruppa, Jochen
    Koenig, Inke R.
    Ziegler, Andreas
    ANNALS OF HUMAN GENETICS, 2012, 76 : 426 - 426
  • [4] PItcHPERFeCT: Primary Intracranial Hemorrhage Probability Estimation using Random Forests on CT
    Muschelli, John
    Sweeney, Elizabeth M.
    Ullman, Natalie L.
    Vespa, Paul
    Hanley, Daniel F.
    Crainiceanu, Ciprian M.
    NEUROIMAGE-CLINICAL, 2017, 14 : 379 - 390
  • [5] FORESTS OF PROBABILITY ESTIMATION TREES
    Bostrom, Henrik
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (02)
  • [6] MINIMAX ESTIMATION OF A RANDOM PROBABILITY
    SKIBINSK.M
    SIAM JOURNAL ON APPLIED MATHEMATICS, 1968, 16 (01) : 134 - &
  • [7] Calibrating machine learning approaches for probability estimation: A short expansion
    Ojeda, Francisco M.
    Baker, Stuart G.
    Ziegler, Andreas
    STATISTICS IN MEDICINE, 2024, 43 (21) : 4212 - 4215
  • [8] Calibrating machine learning approaches for probability estimation: A comprehensive comparison
    Ojeda, Francisco M.
    Jansen, Max L.
    Thiery, Alexandre
    Blankenberg, Stefan
    Weimar, Christian
    Schmid, Matthias
    Ziegler, Andreas
    STATISTICS IN MEDICINE, 2023, 42 (29) : 5451 - 5478
  • [9] Mutual Information Estimation with Random Forests
    Koeman, Mike
    Heskes, Tom
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 524 - 531
  • [10] ESTIMATION OF PROBABILITY DENSITY OF RANDOM PROCESSES
    GOLDBERG, NI
    AUTOMATION AND REMOTE CONTROL, 1978, 39 (07) : 1088 - 1093