Consistent and unbiased variable selection under indepedent features using Random Forest permutation importance

被引:2
|
作者
Ramosaj, Burim [1 ]
Pauly, Markus [1 ]
机构
[1] Tech Univ Dortmund, Inst Math Stat & Applicat Ind, Fac Stat, Joseph Von Fraunhofer Str 2-4, D-44227 Dortmund, Germany
关键词
Random Forest; permutation importance; unbiasedness; consistency; Out-of-Bag samples; statistical learning;
D O I
10.3150/22-BEJ1534
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variable selection in sparse regression models is an important task as applications ranging from biomedical re-search to econometrics have shown. Especially for higher dimensional regression problems, for which the regres-sion function as the link between response and covariates cannot be directly detected, the selection of informative variables is challenging. Under these circumstances, the Random Forest method is an helpful tool to predict new outcomes while delivering measures for variable selection. One common approach is the usage of the permutation importance. Due to its intuitive idea and flexible usage, it is important to explore circumstances, for which the permutation importance based on Random Forest correctly indicates informative covariates. Regarding the latter, we deliver theoretical guarantees for the validity of the permutation importance measure under specific assump-tions such as the mutual independence of the features and prove its (asymptotic) unbiasedness, while under slightly stricter assumptions, consistency of the permutation importance measure is established. An extensive simulation study supports our findings.
引用
收藏
页码:2101 / 2118
页数:18
相关论文
共 50 条
  • [21] NEARLY UNBIASED VARIABLE SELECTION UNDER MINIMAX CONCAVE PENALTY
    Zhang, Cun-Hui
    ANNALS OF STATISTICS, 2010, 38 (02): : 894 - 942
  • [22] Similarity based on the importance of common features in random forest
    Chen X.
    Han L.
    Leng M.
    Pan X.
    International Journal of Performability Engineering, 2019, 15 (04) : 1171 - 1180
  • [23] A tree approach for variable selection and its random forest
    Liu, Yu
    Qin, Xu
    Cai, Zhibo
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2025, 202
  • [24] Random forest for ordinal responses: Prediction and variable selection
    Janitza, Silke
    Tutz, Gerhard
    Boulesteix, Anne-Laure
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 96 : 57 - 73
  • [25] Variable ranking and selection with random forest for unbalanced data
    Bradter, Ute
    Altringham, John D.
    Kunin, William E.
    Thom, Tim J.
    O'Connell, Jerome
    Benton, Tim G.
    ENVIRONMENTAL DATA SCIENCE, 2022, 1
  • [26] VARIABLE IMPORTANCE AND RANDOM FOREST CLASSIFICATION USING RADARSAT-2 POLSAR DATA
    Hariharan, Siddharth
    Tirodkar, Siddhesh
    De, Shaunak
    Bhattacharya, Avik
    2014 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2014, : 1210 - 1213
  • [27] Variable Importance Measure System Based on Advanced Random Forest
    Song, Shufang
    He, Ruyang
    Shi, Zhaoyin
    Zhang, Weiya
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2021, 128 (01): : 65 - 85
  • [28] MMD-based Variable Importance for Distributional Random Forest
    Benard, Clement
    Naf, Jeffrey
    Josse, Julie
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [29] Features Selection in Character Recognition with Random Forest Classifier
    Homenda, Wladyslaw
    Lesinski, Wojciech
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT I, 2011, 6922 : 93 - +
  • [30] Variable selection using random forests
    Genuer, Robin
    Poggi, Jean-Michel
    Tuleau-Malot, Christine
    PATTERN RECOGNITION LETTERS, 2010, 31 (14) : 2225 - 2236