Assessing agreement between permutation and dropout variable importance methods for regression and random forest models

被引:0
|
作者
Bladen, Kelvyn [1 ]
Cutler, Richard [1 ]
机构
[1] Utah State Univ, Dept Math & Stat, 3900 Old Main Hill, Logan, UT 84322 USA
来源
ELECTRONIC RESEARCH ARCHIVE | 2024年 / 32卷 / 07期
关键词
permutation; variable importance; random forest; variable selection; regression; machine learning; SELECTION;
D O I
10.3934/era.2024203
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Permutation techniques have been used extensively in machine learning algorithms for evaluating variable importance. In ordinary regression, however, variables are often removed to gauge their importance. In this paper, we compared the results for permuting variables to removing variables in regression to assess relations between these two methods. We compared permute-and-predict (PaP) methods with leave-one-covariate-out (LOCO) techniques. We also compared these results with conventional metrics such as regression coefficient estimates, t-statistics, and random forest out-ofbag (OOB) PaP importance. Our results indicate that permutation importance metrics are practically equivalent to those obtained from removing variables in a regression setting. We demonstrate a strong association between the PaP metrics, true coefficients, and regression-estimated coefficients. We also show a strong relation between the LOCO metrics and the regression t-statistics. Finally, we illustrate that manual PaP methods are not equivalent to the OOB PaP technique and suggest prioritizing the use of manual PaP methods on validation data.
引用
收藏
页码:4495 / 4514
页数:20
相关论文
共 50 条
  • [11] Remote fossil prospecting in the Cradle of Humankind: Assessing variable importance for cave site prediction using Random Forest models
    Furtner, Margaret J.
    Anemone, Robert L.
    Wang, Lei
    Caruana, Matthew V.
    Lombard, Marlize
    Brophy, Juliet K.
    AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY, 2024, 183 : 55 - 55
  • [12] Permutation methods in relative risk regression models
    Jiang, Wenyu
    Kalbfleisch, John D.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2008, 138 (02) : 416 - 431
  • [13] Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology
    Eric W. Fox
    Ryan A. Hill
    Scott G. Leibowitz
    Anthony R. Olsen
    Darren J. Thornbrugh
    Marc H. Weber
    Environmental Monitoring and Assessment, 2017, 189
  • [14] Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology
    Fox, Eric W.
    Hill, Ryan A.
    Leibowitz, Scott G.
    Olsen, Anthony R.
    Thornbrugh, Darren J.
    Weber, Marc H.
    ENVIRONMENTAL MONITORING AND ASSESSMENT, 2017, 189 (07)
  • [15] Estimating neuronal variable importance with random forest
    Oh, J
    Laubach, M
    Luczak, A
    PROCEEDINGS OF THE IEEE 29TH ANNUAL NORTHEAST BIOENGINEERING CONFERENCE, 2003, : 33 - 34
  • [16] Variable selection by permutation applied in support vector regression models
    da Cunha, Pedro H. P.
    de Paulo, Ellisson H.
    Folli, Gabriely Silveira
    Nascimento, Marcia H. C.
    Moro, Mariana K.
    Filgueiras, Paulo R.
    JOURNAL OF CHEMOMETRICS, 2022, 36 (10)
  • [17] Efficient permutation testing of variable importance measures by the example of random forests
    Hapfelmeier, Alexander
    Hornung, Roman
    Haller, Bernhard
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 181
  • [18] Assessing Agreement between Methods of Measurement
    Altman, Douglas G.
    Bland, J. Martin
    CLINICAL CHEMISTRY, 2017, 63 (10) : 1653 - 1654
  • [19] A Traffic Event Detection Method Based on Random Forest and Permutation Importance
    Su, Ziyi
    Liu, Qingchao
    Zhao, Chunxia
    Sun, Fengming
    MATHEMATICS, 2022, 10 (06)
  • [20] An AUC-based permutation variable importance measure for random forests
    Silke Janitza
    Carolin Strobl
    Anne-Laure Boulesteix
    BMC Bioinformatics, 14