A tree approach for variable selection and its random forest

被引:0
|
作者
Liu, Yu [1 ]
Qin, Xu [1 ]
Cai, Zhibo [2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Math Sci, 2006 Xiyuan Ave, Chengdu 611731, Sichuan, Peoples R China
[2] Renmin Univ China, Ctr Appl Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[3] Renmin Univ China, Sch Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary partition; Classification and regression tree; Mutual information; Random forests; Sure independence screening; MUTUAL INFORMATION; MODELS;
D O I
10.1016/j.csda.2024.108068
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its "sure screening property". Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Emergence of the giant tree in a random forest
    Cheplyukova, I.A.
    Discrete Mathematics and Applications, 1998, 8 (01): : 17 - 33
  • [32] On the maximal size of tree in a random forest
    Pavlov, Yuriy L.
    DISCRETE MATHEMATICS AND APPLICATIONS, 2024, 34 (04): : 221 - 232
  • [33] AnaData: A Novel Approach for Data Analytics Using Random Forest Tree and SVM
    Devi, Bali
    Kumar, Sarvesh
    Anuradha
    Shankar, Venkatesh Gauri
    COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 511 - 521
  • [34] Genomic selection: a revolutionary approach for forest tree improvement in the wake of climate change
    Umesh Sharma
    H. P. Sankhyan
    Anita Kumari
    Shikha Thakur
    Lalit Thakur
    Divya Mehta
    Sunny Sharma
    Shilpa Sharma
    Neeraj Sankhyan
    Euphytica, 2024, 220
  • [35] Genomic selection: a revolutionary approach for forest tree improvement in the wake of climate change
    Sharma, Umesh
    Sankhyan, H. P.
    Kumari, Anita
    Thakur, Shikha
    Thakur, Lalit
    Mehta, Divya
    Sharma, Sunny
    Sharma, Shilpa
    Sankhyan, Neeraj
    EUPHYTICA, 2024, 220 (01)
  • [36] Audio Content Feature Selection and Classification A random forests and decision tree approach
    Al-Maathidi, Muhammad M.
    Li, Francis F.
    PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATCS AND COMPUTING (IEEE PIC), 2015, : 108 - 112
  • [37] Genomic selection in forest tree breeding
    Dario Grattapaglia
    Marcos D. V. Resende
    Tree Genetics & Genomes, 2011, 7 : 241 - 255
  • [38] Genomic selection in forest tree breeding
    Grattapaglia, Dario
    Resende, Marcos D. V.
    TREE GENETICS & GENOMES, 2011, 7 (02) : 241 - 255
  • [39] A Clustering Approach for Feature Selection in Microarray Data Classification Using Random forest
    Aydadenta, Husna
    Adiwijaya
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (05): : 1167 - 1175
  • [40] PREDICTION OF PIVOTAL RESPONSE TREATMENT OUTCOME WITH TASK FMRI USING RANDOM FOREST AND VARIABLE SELECTION
    Zhuang, Juntang
    Dvornek, Nicha C.
    Li, Xiaoxiao
    Yang, Daniel
    Ventola, Pamela
    Duncan, James S.
    2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 97 - 100