Consistent and unbiased variable selection under indepedent features using Random Forest permutation importance

被引:2
|
作者
Ramosaj, Burim [1 ]
Pauly, Markus [1 ]
机构
[1] Tech Univ Dortmund, Inst Math Stat & Applicat Ind, Fac Stat, Joseph Von Fraunhofer Str 2-4, D-44227 Dortmund, Germany
关键词
Random Forest; permutation importance; unbiasedness; consistency; Out-of-Bag samples; statistical learning;
D O I
10.3150/22-BEJ1534
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variable selection in sparse regression models is an important task as applications ranging from biomedical re-search to econometrics have shown. Especially for higher dimensional regression problems, for which the regres-sion function as the link between response and covariates cannot be directly detected, the selection of informative variables is challenging. Under these circumstances, the Random Forest method is an helpful tool to predict new outcomes while delivering measures for variable selection. One common approach is the usage of the permutation importance. Due to its intuitive idea and flexible usage, it is important to explore circumstances, for which the permutation importance based on Random Forest correctly indicates informative covariates. Regarding the latter, we deliver theoretical guarantees for the validity of the permutation importance measure under specific assump-tions such as the mutual independence of the features and prove its (asymptotic) unbiasedness, while under slightly stricter assumptions, consistency of the permutation importance measure is established. An extensive simulation study supports our findings.
引用
收藏
页码:2101 / 2118
页数:18
相关论文
共 50 条
  • [1] Margin Based Permutation Variable Importance: a Stable Importance Measure for Random Forest
    Pei, Liu
    Lai, Yongxuan
    Piao, Peng
    Yang, Fan
    2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (IEEE ISKE), 2017,
  • [2] The behaviour of random forest permutation-based variable importance measures under predictor correlation
    Kristin K Nicodemus
    James D Malley
    Carolin Strobl
    Andreas Ziegler
    BMC Bioinformatics, 11
  • [3] The behaviour of random forest permutation-based variable importance measures under predictor correlation
    Nicodemus, Kristin K.
    Malley, James D.
    Strobl, Carolin
    Ziegler, Andreas
    BMC BIOINFORMATICS, 2010, 11
  • [4] Unbiased variable importance for random forests
    Loecher, Markus
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (05) : 1413 - 1425
  • [6] Assessing agreement between permutation and dropout variable importance methods for regression and random forest models
    Bladen, Kelvyn
    Cutler, Richard
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (07): : 4495 - 4514
  • [7] A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest
    Huang, Nantian
    Lu, Guobo
    Xu, Dianguo
    ENERGIES, 2016, 9 (10)
  • [8] Melanoma important features selection using random forest approach
    Paja, Wieslaw
    Wrzesien, Mariusz
    2013 6TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTIONS (HSI), 2013, : 415 - 418
  • [9] Estimating neuronal variable importance with random forest
    Oh, J
    Laubach, M
    Luczak, A
    PROCEEDINGS OF THE IEEE 29TH ANNUAL NORTHEAST BIOENGINEERING CONFERENCE, 2003, : 33 - 34
  • [10] Forward variable selection for random forest models
    Velthoen, Jasper
    Cai, Juan-Juan
    Jongbloed, Geurt
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (13) : 2836 - 2856