A tree approach for variable selection and its random forest

被引:0
|
作者
Liu, Yu [1 ]
Qin, Xu [1 ]
Cai, Zhibo [2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Math Sci, 2006 Xiyuan Ave, Chengdu 611731, Sichuan, Peoples R China
[2] Renmin Univ China, Ctr Appl Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[3] Renmin Univ China, Sch Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary partition; Classification and regression tree; Mutual information; Random forests; Sure independence screening; MUTUAL INFORMATION; MODELS;
D O I
10.1016/j.csda.2024.108068
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its "sure screening property". Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers.
引用
收藏
页数:19
相关论文
共 50 条
  • [11] From Random Forest to an interpretable decision tree - An evolutionary approach
    Jurczuk, Krzysztof
    Czajkowski, Marcin
    Kretowski, Marek
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 291 - 294
  • [12] A comparison of random forest variable selection methods for classification prediction modeling
    Speiser, Jaime Lynn
    Miller, Michael E.
    Tooze, Janet
    Ip, Edward
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 134 : 93 - 101
  • [13] Random forest-based approach for physiological functional variable selection for driver's stress level classification
    El Haouij, Neska
    Poggi, Jean-Michel
    Ghozi, Raja
    Sevestre-Ghalila, Sylvie
    Jaidane, Meriem
    STATISTICAL METHODS AND APPLICATIONS, 2019, 28 (01): : 157 - 185
  • [14] Random forest-based approach for physiological functional variable selection for driver’s stress level classification
    Neska El Haouij
    Jean-Michel Poggi
    Raja Ghozi
    Sylvie Sevestre-Ghalila
    Mériem Jaïdane
    Statistical Methods & Applications, 2019, 28 : 157 - 185
  • [15] Melanoma important features selection using random forest approach
    Paja, Wieslaw
    Wrzesien, Mariusz
    2013 6TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTIONS (HSI), 2013, : 415 - 418
  • [16] A random-effect model approach for group variable selection
    Lee, Sangin
    Pawitan, Yudi
    Lee, Youngjo
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 89 : 147 - 157
  • [17] Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology
    Eric W. Fox
    Ryan A. Hill
    Scott G. Leibowitz
    Anthony R. Olsen
    Darren J. Thornbrugh
    Marc H. Weber
    Environmental Monitoring and Assessment, 2017, 189
  • [18] Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology
    Fox, Eric W.
    Hill, Ryan A.
    Leibowitz, Scott G.
    Olsen, Anthony R.
    Thornbrugh, Darren J.
    Weber, Marc H.
    ENVIRONMENTAL MONITORING AND ASSESSMENT, 2017, 189 (07)
  • [19] Variable selection and prediction of uniaxial compressive strength and modulus of elasticity by random forest
    Matin, S. S.
    Farahzadi, L.
    Makaremi, S.
    Chelgani, S. Chehreh
    Sattari, Gh.
    APPLIED SOFT COMPUTING, 2018, 70 : 980 - 987
  • [20] Variable selection wrapper in presence of correlated input variables for random forest models
    Rotari, Marta
    Kulahci, Murat
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2024, 40 (01) : 297 - 312