Optimal Feature Set Size in Random Forest Regression

被引:28
|
作者
Han, Sunwoo [1 ]
Kim, Hyunjoong [2 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98006 USA
[2] Yonsei Univ, Dept Appl Stat, Seoul 03722, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 08期
基金
新加坡国家研究基金会;
关键词
random forest; feature set size; grid search; regression; PREDICTION;
D O I
10.3390/app11083428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the randomForest R package and using the common grid search method.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] On the Optimal Size of Candidate Feature Set in Random forest
    Han, Sunwoo
    Kim, Hyunjoong
    APPLIED SCIENCES-BASEL, 2019, 9 (05):
  • [2] Automatic Human Body Feature Extraction and Size Measurement by Random Forest Regression Analysis of Geodesics Distance
    Tan Xiaohui
    Peng Xiaoyu
    Liu LiWen
    Xia Qing
    2017 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2017), 2017, : 261 - 266
  • [3] DETERMINING THE OPTIMAL SAMPLING SET SIZE FOR RANDOM SEARCH
    Zhu, Chenbo
    Xu, Jie
    Chen, Chun-Hung
    Lee, Loo Hay
    Hu, Jianqiang
    2013 WINTER SIMULATION CONFERENCE (WSC), 2013, : 1016 - +
  • [4] Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression
    Jaiswal, Jitendra Kumar
    Samikannu, Rita
    2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 65 - 68
  • [5] A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression
    Gerstorfer Y.
    Hahn-Klimroth M.
    Krieg L.
    Data Science Journal, 2023, 22 (01)
  • [6] Random forest regression feature importance for climate impact pathway detection
    Brown, Meredith G. L.
    Peterson, Matt G.
    Tezaur, Irina K.
    .Peterson, Kara
    Bull, Diana L.
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2025, 464
  • [7] Feature Selection Based on Random Forest for Partial Discharges Characteristic Set
    Yao, Rui
    Li, Jun
    Hui, Meng
    Bai, Lin
    Wu, Qisheng
    IEEE ACCESS, 2020, 8 : 159151 - 159161
  • [8] Optimal Feature Set and Minimal Training Size for Pronunciation Adaptation in TTS
    Tahon, Marie
    Qader, Raheel
    Lecorve, Gwenole
    Lolive, Damien
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2016, 2016, 9918 : 108 - 119
  • [9] Distributed defect recognition on steel surfaces using an improved random forest algorithm with optimal multi-feature-set fusion
    Yalin Wang
    Haibing Xia
    Xiaofeng Yuan
    Ling Li
    Bei Sun
    Multimedia Tools and Applications, 2018, 77 : 16741 - 16770
  • [10] Distributed defect recognition on steel surfaces using an improved random forest algorithm with optimal multi-feature-set fusion
    Wang, Yalin
    Xia, Haibing
    Yuan, Xiaofeng
    Li, Ling
    Sun, Bei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (13) : 16741 - 16770