A tree approach for variable selection and its random forest

被引:0
|
作者
Liu, Yu [1 ]
Qin, Xu [1 ]
Cai, Zhibo [2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Math Sci, 2006 Xiyuan Ave, Chengdu 611731, Sichuan, Peoples R China
[2] Renmin Univ China, Ctr Appl Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[3] Renmin Univ China, Sch Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary partition; Classification and regression tree; Mutual information; Random forests; Sure independence screening; MUTUAL INFORMATION; MODELS;
D O I
10.1016/j.csda.2024.108068
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its "sure screening property". Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] A Hybrid Random Forest Variable Selection Approach for Omics Data
    Fouodo, Cesaire J. K.
    Koenig, Inke R.
    Szymczak, Silke
    GENETIC EPIDEMIOLOGY, 2022, 46 (07) : 494 - 494
  • [2] Variable selection for estimating individual tree height using genetic algorithm and random forest
    Miranda, Evandro Nunes
    Groenner Barbosa, Bruno Henrique
    Godinho Silva, Sergio Henrique
    Ussi Monti, Cassio Augusto
    Tng, David Yue Phin
    Gomide, Lucas Rezende
    FOREST ECOLOGY AND MANAGEMENT, 2022, 504
  • [3] Forward variable selection for random forest models
    Velthoen, Jasper
    Cai, Juan-Juan
    Jongbloed, Geurt
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (13) : 2836 - 2856
  • [4] Random forest for ordinal responses: Prediction and variable selection
    Janitza, Silke
    Tutz, Gerhard
    Boulesteix, Anne-Laure
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 96 : 57 - 73
  • [5] Variable ranking and selection with random forest for unbalanced data
    Bradter, Ute
    Altringham, John D.
    Kunin, William E.
    Thom, Tim J.
    O'Connell, Jerome
    Benton, Tim G.
    ENVIRONMENTAL DATA SCIENCE, 2022, 1
  • [6] A new variable selection approach using Random Forests
    Hapfelmeier, A.
    Ulm, K.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 60 : 50 - 69
  • [7] A random forest approach for interval selection in functional regression
    Servien, Remi
    Vialaneix, Nathalie
    STATISTICAL ANALYSIS AND DATA MINING, 2024, 17 (04)
  • [8] Random Forest Feature Selection Approach for Image Segmentation
    Lefkovits, Laszlo
    Lefkovits, Szidonia
    Emerich, Simina
    Vaida, Mircea Florin
    NINTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2016), 2017, 10341
  • [9] Random Forest using tree selection method to classify unbalanced data
    Xu, Baoxun
    Ye, Yunming
    Wang, Qiang
    Li, Junjie
    Chen, Xiaojun
    FOURTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2012), 2012, 8334
  • [10] Multivariable Fingerprints With Random Forest Variable Selection for Indoor Positioning System
    Ji, Wenqing
    Zhao, Kun
    Zheng, Zhengqi
    Yu, Chao
    Huang, Shuai
    IEEE SENSORS JOURNAL, 2022, 22 (06) : 5398 - 5406