Calibrating random forests for probability estimation

被引:37
|
作者
Dankowski, Theresa [1 ]
Ziegler, Andreas [1 ,2 ,3 ,4 ]
机构
[1] Univ Lubeck, Univ Klinikum Schleswig Holstein, Inst Med Biometrie & Stat, Campus Lubeck, Lubeck, Germany
[2] Univ Lubeck, Zentrum Klin Studien, Lubeck, Germany
[3] DZHK German Ctr Cardiovasc Res, Hamburg Kiel Lubeck Partner Site, Lubeck, Germany
[4] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, Pietermaritzburg, South Africa
关键词
calibration; logistic regression; probability estimation; probability machine; random forests; updating; PREDICTION; VALIDATION; SCORE;
D O I
10.1002/sim.6959
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. (C) 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
引用
收藏
页码:3949 / 3960
页数:12
相关论文
共 50 条
  • [31] Conditional canonical correlation estimation based on covariates with random forests
    Alakus, Cansu
    Larocque, Denis
    Jacquemont, Sebastien
    Barlaam, Fanny
    Martin, Charles-Olivier
    Agbogba, Kristian
    Lippe, Sarah
    Labbe, Aurelie
    BIOINFORMATICS, 2021, 37 (17) : 2714 - 2721
  • [32] Efficient random subspace decision forests with a simple probability dimensionality setting scheme
    Wang, Quan
    Wang, Fei
    Li, Zhongheng
    Jiang, Peilin
    Ren, Fuji
    Nie, Feiping
    INFORMATION SCIENCES, 2023, 638
  • [33] A Conformal Regressor With Random Forests for Tropical Cyclone Intensity Estimation
    Wang, Pingping
    Wang, Ping
    Wang, Di
    Xue, Bing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [34] A Conformal Regressor with Random Forests for Tropical Cyclone Intensity Estimation
    Wang, Pingping
    Wang, Ping
    Wang, Di
    Xue, Bing
    IEEE Transactions on Geoscience and Remote Sensing, 2022, 60
  • [35] Photometric redshift estimation on SDSS data using Random Forests
    Carliles, Samuel
    Budavari, Tamas
    Heinis, Sebastien
    Priebe, Carey
    Szalay, Alexander
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XVII, 2008, 394 : 521 - +
  • [36] Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
    Wager, Stefan
    Athey, Susan
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (523) : 1228 - 1242
  • [37] Self-calibrating probability forecasting
    Vovk, V
    Shafer, G
    Nouretdinov, I
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 1133 - 1140
  • [38] Calibrating Probability with Undersampling for Unbalanced Classification
    Dal Pozzolo, Andrea
    Caelen, Olivier
    Johnson, Reid A.
    Bontempi, Gianluca
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 159 - 166
  • [39] A Bayesian Approach for Calibrating Probability Judgments
    Firmino, Paulo Renato A.
    Santana, Nielson A.
    XI BRAZILIAN MEETING ON BAYESIAN STATISTICS (EBEB 2012), 2012, 1490 : 135 - 142
  • [40] RARE EVENT PROBABILITY ESTIMATION FOR CONNECTIVITY OF LARGE RANDOM GRAPHS
    Shah, Rohan
    Hirsch, Christian
    Kroese, Dirk P.
    Schmidt, Volker
    PROCEEDINGS OF THE 2014 WINTER SIMULATION CONFERENCE (WSC), 2014, : 510 - 521