Optimal Feature Set Size in Random Forest Regression

被引:28
|
作者
Han, Sunwoo [1 ]
Kim, Hyunjoong [2 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98006 USA
[2] Yonsei Univ, Dept Appl Stat, Seoul 03722, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 08期
基金
新加坡国家研究基金会;
关键词
random forest; feature set size; grid search; regression; PREDICTION;
D O I
10.3390/app11083428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the randomForest R package and using the common grid search method.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Optimal designs for quadratic regression with random block effects: The case of block size two
    Huang, Shih-Hao
    Cheng, Ching-Shui
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2016, 175 : 67 - 77
  • [22] Random feature selection using random subspace logistic regression
    Wichitaksorn, Nuttanan
    Kang, Yingyue
    Zhang, Faqiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 217
  • [23] Statistical Load Forecasting Using Optimal Quantile Regression Random Forest and Risk Assessment Index
    Aprillia, Happy
    Yang, Hong-Tzer
    Huang, Chao-Ming
    IEEE TRANSACTIONS ON SMART GRID, 2021, 12 (02) : 1467 - 1480
  • [24] Random Forest Based Optimal Feature Selection for Partial Discharge Pattern Recognition in HV Cables
    Peng, Xiaosheng
    Li, Jinshu
    Wang, Ganjun
    Wu, Yijiang
    Li, Lee
    Li, Zhaohui
    Bhatti, Ashfaque Ahmed
    Zhou, Chengke
    Hepburn, Donald M.
    Reid, Alistair J.
    Judd, Martin D.
    Siew, Wan Hoon
    IEEE TRANSACTIONS ON POWER DELIVERY, 2019, 34 (04) : 1715 - 1724
  • [25] Optimal Feature Selection for Partial Discharge Recognition of Cable Systems Based on the Random Forest Method
    Peng, Xiaosheng
    Yang, Guangyao
    Zheng, Shijie
    Xiong, Lei
    Bai, Junyang
    2016 CHINA INTERNATIONAL CONFERENCE ON ELECTRICITY DISTRIBUTION (CICED), 2016,
  • [26] A novel improved random forest for text classification using feature ranking and optimal number of trees
    Jalal, Nasir
    Mehmood, Arif
    Choi, Gyu Sang
    Ashraf, Imran
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 2733 - 2742
  • [27] Short-Term Load Forecasting Based on Optimized Random Forest and Optimal Feature Selection
    Magalhaes, Bianca
    Bento, Pedro
    Pombo, Jose
    Calado, Maria do Rosario
    Mariano, Silvio
    ENERGIES, 2024, 17 (08)
  • [28] Optimal Wavelet Based Feature Extraction and Classification of Power Quality Disturbances Using Random Forest
    Markovska, Marija
    Taskovski, Dimitar
    17TH IEEE INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES - IEEE EUROCON 2017 CONFERENCE PROCEEDINGS, 2017, : 855 - 859
  • [29] Multimodal random forest based tensor regression
    Kaymak, Sertan
    Patras, Ioannis
    IET COMPUTER VISION, 2014, 8 (06) : 650 - 657
  • [30] Estimating residual variance in random forest regression
    Mendez, Guillermo
    Lohr, Sharon
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (11) : 2937 - 2950