Double random forest

被引:43
|
作者
Han, Sunwoo [1 ]
Kim, Hyunjoong [2 ]
Lee, Yung-Seop [3 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98006 USA
[2] Yonsei Univ, Dept Appl Stat, Seoul 03722, South Korea
[3] Dongguk Univ, Dept Stat, Seoul 04620, South Korea
基金
新加坡国家研究基金会;
关键词
Classification; Ensemble; Random forest; Bootstrap; Decision tree; CLASSIFICATION TREES; ALGORITHMS; ENSEMBLES;
D O I
10.1007/s10994-020-05889-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.
引用
收藏
页码:1569 / 1586
页数:18
相关论文
共 50 条
  • [41] Differential Private Random Forest
    Patil, Abhijit
    Singh, Sanjay
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2623 - 2630
  • [42] Random Forest Spatial Interpolation
    Sekulic, Aleksandar
    Kilibarda, Milan
    Heuvelink, Gerard B. M.
    Nikolic, Mladen
    Bajat, Branislav
    REMOTE SENSING, 2020, 12 (10)
  • [43] Search for the smallest random forest
    Zhang, Heping
    Wang, Minghui
    STATISTICS AND ITS INTERFACE, 2009, 2 (03) : 381 - 388
  • [44] A proactive approach for random forest
    Cepero-Perez, Nayma
    Moreno-Espino, Mailyn
    Morales, Eduardo F.
    Lopez-Gonzalez, Ariel
    Yanez-Marquez, Cornelio
    Pavon, Juan
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [45] NUMBER OF TREES IN A RANDOM FOREST
    PALMER, EM
    SCHWENK, AJ
    JOURNAL OF COMBINATORIAL THEORY SERIES B, 1979, 27 (02) : 109 - 121
  • [46] Heterogeneous oblique random forest
    Katuwal, Rakesh
    Suganthan, P. N.
    Zhang, Le
    PATTERN RECOGNITION, 2020, 99 (99)
  • [47] All That Jazz in the Random Forest
    Kubera, Elzbieta
    Kursa, Miron B.
    Rudnicki, Witold R.
    Rudnicki, Radoslaw
    Wieczorkowska, Alicja A.
    FOUNDATIONS OF INTELLIGENT SYSTEMS, 2011, 6804 : 543 - 553
  • [48] Random Forest Density Estimation
    Wen, Hongwei
    Hang, Hanyuan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [49] GeoRF: a geospatial random forest
    Geerts, Margot
    vanden Broucke, Seppe
    De Weerdt, Jochen
    DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (06) : 3414 - 3448
  • [50] Random Forest Model for Labor Induction in Pregnant Women With Hypertensive Disorders Using a Cervical Double Balloon
    Huang, Kehua
    Liu, Zhaozhen
    Luo, Jinying
    Li, Xiaoling
    Yan, Jianying
    ALTERNATIVE THERAPIES IN HEALTH AND MEDICINE, 2023, 29 (01) : 44 - 51