Assessing agreement between permutation and dropout variable importance methods for regression and random forest models

被引:0
|
作者
Bladen, Kelvyn [1 ]
Cutler, Richard [1 ]
机构
[1] Utah State Univ, Dept Math & Stat, 3900 Old Main Hill, Logan, UT 84322 USA
来源
ELECTRONIC RESEARCH ARCHIVE | 2024年 / 32卷 / 07期
关键词
permutation; variable importance; random forest; variable selection; regression; machine learning; SELECTION;
D O I
10.3934/era.2024203
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Permutation techniques have been used extensively in machine learning algorithms for evaluating variable importance. In ordinary regression, however, variables are often removed to gauge their importance. In this paper, we compared the results for permuting variables to removing variables in regression to assess relations between these two methods. We compared permute-and-predict (PaP) methods with leave-one-covariate-out (LOCO) techniques. We also compared these results with conventional metrics such as regression coefficient estimates, t-statistics, and random forest out-ofbag (OOB) PaP importance. Our results indicate that permutation importance metrics are practically equivalent to those obtained from removing variables in a regression setting. We demonstrate a strong association between the PaP metrics, true coefficients, and regression-estimated coefficients. We also show a strong relation between the LOCO metrics and the regression t-statistics. Finally, we illustrate that manual PaP methods are not equivalent to the OOB PaP technique and suggest prioritizing the use of manual PaP methods on validation data.
引用
收藏
页码:4495 / 4514
页数:20
相关论文
共 50 条
  • [1] Margin Based Permutation Variable Importance: a Stable Importance Measure for Random Forest
    Pei, Liu
    Lai, Yongxuan
    Piao, Peng
    Yang, Fan
    2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (IEEE ISKE), 2017,
  • [2] Variable Importance Assessment in Regression: Linear Regression versus Random Forest
    Groemping, Ulrike
    AMERICAN STATISTICIAN, 2009, 63 (04): : 308 - 319
  • [4] Variable importance in regression models
    Groemping, Ulrike
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2015, 7 (02) : 137 - 152
  • [5] Consistent and unbiased variable selection under indepedent features using Random Forest permutation importance
    Ramosaj, Burim
    Pauly, Markus
    BERNOULLI, 2023, 29 (03) : 2101 - 2118
  • [6] The behaviour of random forest permutation-based variable importance measures under predictor correlation
    Kristin K Nicodemus
    James D Malley
    Carolin Strobl
    Andreas Ziegler
    BMC Bioinformatics, 11
  • [7] The behaviour of random forest permutation-based variable importance measures under predictor correlation
    Nicodemus, Kristin K.
    Malley, James D.
    Strobl, Carolin
    Ziegler, Andreas
    BMC BIOINFORMATICS, 2010, 11
  • [8] Variable importance in latent variable regression models
    Kvalheim, Olav M.
    Arneberg, Reidar
    Bleie, Olav
    Rajalahti, Tarja
    Smilde, Age K.
    Westerhuis, Johan A.
    JOURNAL OF CHEMOMETRICS, 2014, 28 (08) : 615 - 622
  • [9] Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival
    Ishwaran, Hemant
    Lu, Min
    STATISTICS IN MEDICINE, 2019, 38 (04) : 558 - 582
  • [10] A comparison of random forest variable selection methods for regression modeling of continuous outcomes
    O'Connell, Nathaniel S.
    Jaeger, Byron C.
    Bullock, Garrett S.
    Speiser, Jaime Lynn
    BRIEFINGS IN BIOINFORMATICS, 2025, 26 (02)