Assessing agreement between permutation and dropout variable importance methods for regression and random forest models

被引:0
|
作者
Bladen, Kelvyn [1 ]
Cutler, Richard [1 ]
机构
[1] Utah State Univ, Dept Math & Stat, 3900 Old Main Hill, Logan, UT 84322 USA
来源
ELECTRONIC RESEARCH ARCHIVE | 2024年 / 32卷 / 07期
关键词
permutation; variable importance; random forest; variable selection; regression; machine learning; SELECTION;
D O I
10.3934/era.2024203
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Permutation techniques have been used extensively in machine learning algorithms for evaluating variable importance. In ordinary regression, however, variables are often removed to gauge their importance. In this paper, we compared the results for permuting variables to removing variables in regression to assess relations between these two methods. We compared permute-and-predict (PaP) methods with leave-one-covariate-out (LOCO) techniques. We also compared these results with conventional metrics such as regression coefficient estimates, t-statistics, and random forest out-ofbag (OOB) PaP importance. Our results indicate that permutation importance metrics are practically equivalent to those obtained from removing variables in a regression setting. We demonstrate a strong association between the PaP metrics, true coefficients, and regression-estimated coefficients. We also show a strong relation between the LOCO metrics and the regression t-statistics. Finally, we illustrate that manual PaP methods are not equivalent to the OOB PaP technique and suggest prioritizing the use of manual PaP methods on validation data.
引用
收藏
页码:4495 / 4514
页数:20
相关论文
共 50 条
  • [31] Statistical methods in assessing agreement: Models, issues, and tools
    Lin, L
    Hedayat, AS
    Sinha, B
    Yang, M
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) : 257 - 270
  • [32] Approximating Prediction Uncertainty for Random Forest Regression Models
    Coulston, John W.
    Blinn, Christine E.
    Thomas, Valerie A.
    Wynne, Randolph H.
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2016, 82 (03): : 189 - 197
  • [33] Forecasting Monthly Water Deficit Based on Multi-Variable Linear Regression and Random Forest Models
    Li, Yi
    Wei, Kangkang
    Chen, Ke
    He, Jianqiang
    Zhao, Yong
    Yang, Guang
    Yao, Ning
    Niu, Ben
    Wang, Bin
    Wang, Lei
    Feng, Puyu
    Yang, Zhe
    WATER, 2023, 15 (06)
  • [34] Variable Importance Measure System Based on Advanced Random Forest
    Song, Shufang
    He, Ruyang
    Shi, Zhaoyin
    Zhang, Weiya
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2021, 128 (01): : 65 - 85
  • [35] MMD-based Variable Importance for Distributional Random Forest
    Benard, Clement
    Naf, Jeffrey
    Josse, Julie
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [36] A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression
    Gerstorfer Y.
    Hahn-Klimroth M.
    Krieg L.
    Data Science Journal, 2023, 22 (01)
  • [37] Random forest regression feature importance for climate impact pathway detection
    Brown, Meredith G. L.
    Peterson, Matt G.
    Tezaur, Irina K.
    .Peterson, Kara
    Bull, Diana L.
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2025, 464
  • [38] ASSESSING THE IMPORTANCE OF AN INDEPENDENT VARIABLE IN MULTIPLE-REGRESSION - IS STEPWISE UNWISE
    LEIGH, JP
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 1988, 41 (07) : 669 - 677
  • [39] Daily Evapotranspiration Mapping Using Regression Random Forest Models
    Gonzalo-Martin, Consuelo
    Lillo-Saavedra, Mario
    Garcia-Pedrero, Angel
    Lagos, Octavio
    Menasalvas, Ernestina
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (12) : 5359 - 5368
  • [40] Bias in random forest variable importance measures: Illustrations, sources and a solution
    Strobl, Carolin
    Boulesteix, Anne-Laure
    Zeileis, Achim
    Hothorn, Torsten
    BMC BIOINFORMATICS, 2007, 8 (1)