Consistent and unbiased variable selection under indepedent features using Random Forest permutation importance

被引:2
|
作者
Ramosaj, Burim [1 ]
Pauly, Markus [1 ]
机构
[1] Tech Univ Dortmund, Inst Math Stat & Applicat Ind, Fac Stat, Joseph Von Fraunhofer Str 2-4, D-44227 Dortmund, Germany
关键词
Random Forest; permutation importance; unbiasedness; consistency; Out-of-Bag samples; statistical learning;
D O I
10.3150/22-BEJ1534
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variable selection in sparse regression models is an important task as applications ranging from biomedical re-search to econometrics have shown. Especially for higher dimensional regression problems, for which the regres-sion function as the link between response and covariates cannot be directly detected, the selection of informative variables is challenging. Under these circumstances, the Random Forest method is an helpful tool to predict new outcomes while delivering measures for variable selection. One common approach is the usage of the permutation importance. Due to its intuitive idea and flexible usage, it is important to explore circumstances, for which the permutation importance based on Random Forest correctly indicates informative covariates. Regarding the latter, we deliver theoretical guarantees for the validity of the permutation importance measure under specific assump-tions such as the mutual independence of the features and prove its (asymptotic) unbiasedness, while under slightly stricter assumptions, consistency of the permutation importance measure is established. An extensive simulation study supports our findings.
引用
收藏
页码:2101 / 2118
页数:18
相关论文
共 50 条
  • [41] Variable Importance Assessment in Regression: Linear Regression versus Random Forest
    Groemping, Ulrike
    AMERICAN STATISTICIAN, 2009, 63 (04): : 308 - 319
  • [42] An experimental study of the intrinsic stability of random forest variable importance measures
    Wang, Huazhen
    Yang, Fan
    Luo, Zhiyuan
    BMC BIOINFORMATICS, 2016, 17
  • [43] An experimental study of the intrinsic stability of random forest variable importance measures
    Huazhen Wang
    Fan Yang
    Zhiyuan Luo
    BMC Bioinformatics, 17
  • [44] Bias in random forest variable importance measures: Illustrations, sources and a solution
    Carolin Strobl
    Anne-Laure Boulesteix
    Achim Zeileis
    Torsten Hothorn
    BMC Bioinformatics, 8
  • [45] Time-dependent tree-structured survival analysis with unbiased variable selection through permutation tests
    Wallace, M. L.
    STATISTICS IN MEDICINE, 2014, 33 (27) : 4790 - 4804
  • [46] PREDICTION OF PIVOTAL RESPONSE TREATMENT OUTCOME WITH TASK FMRI USING RANDOM FOREST AND VARIABLE SELECTION
    Zhuang, Juntang
    Dvornek, Nicha C.
    Li, Xiaoxiao
    Yang, Daniel
    Ventola, Pamela
    Duncan, James S.
    2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 97 - 100
  • [47] Using Random Forest with Improved Variable Selection to Predict the Compressive Strength of Concrete with Lithium Slag
    Wei L.
    Huang L.
    Zeng L.
    Cailiao Daobao/Materials Reports, 2024, 38 (09):
  • [48] Variable Selection Using Mean Decrease Accuracy And Mean Decrease Gini Based on Random Forest
    Han, Hong
    Guo, Xiaoling
    Yu, Hua
    PROCEEDINGS OF 2016 IEEE 7TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2016), 2016, : 219 - 224
  • [49] A Systematic Approach for Variable Selection With Random Forests: Achieving Stable Variable Importance Values
    Behnamian, Amir
    Millard, Koreen
    Banks, Sarah N.
    White, Lori
    Richardson, Murray
    Pasher, Jon
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (11) : 1988 - 1992
  • [50] Permutation importance based modified guided regularized random forest in human activity recognition with smartphone
    Thakur, Dipanwita
    Biswas, Suparna
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 129