A bias-variance analysis of state-of-the-art random forest text classifiers

被引:0
|
作者
Thiago Salles
Leonardo Rocha
Marcos Gonçalves
机构
[1] Federal University of Minas Gerais,
[2] Federal University of São João Del Rei,undefined
关键词
Random forests; Text classification; Bias variance analysis; 62K25; 62F86;
D O I
暂无
中图分类号
学科分类号
摘要
Random forest (RF) classifiers do excel in a variety of automatic classification tasks, such as topic categorization and sentiment analysis. Despite such advantages, RF models have been shown to perform poorly when facing noisy data, commonly found in textual data, for instance. Some RF variants have been proposed to provide better generalization capabilities under such challenging scenario, including lazy, boosted and randomized forests, all which exhibit significant reductions on error rate when compared to the traditional RFs. In this work, we analyze the behavior of such variants under the bias-variance decomposition of error rate. Such an analysis is of utmost importance to uncover the main causes of the observed improvements enjoyed by those variants in classification effectiveness. As we shall see, significant reductions in variance along with stability in bias explain a large portion of the improvements for the lazy and boosted RF variants. Such an analysis also sheds light on new promising directions for further enhancements in RF-based learners, such as the introduction of new randomization sources on both, lazy and boosted variants.
引用
收藏
页码:379 / 405
页数:26
相关论文
共 50 条
  • [1] A bias-variance analysis of state-of-the-art random forest text classifiers
    Salles, Thiago
    Rocha, Leonardo
    Goncalves, Marcos
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2021, 15 (02) : 379 - 405
  • [2] Random aggregated and bagged ensembles of SVMs: An empirical bias-variance analysis
    Valentini, G
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2004, 3077 : 263 - 272
  • [3] Bias-Variance Tradeoffs in Program Analysis
    Sharma, Rahul
    Nori, Aditya V.
    Aiken, Alex
    ACM SIGPLAN NOTICES, 2014, 49 (01) : 127 - 137
  • [4] Bias-variance analysis and ensembles of SVM
    Valentini, G
    Dietterich, TG
    MULTIPLE CLASSIFIER SYSTEMS, 2002, 2364 : 222 - 231
  • [5] On the stability and bias-variance analysis of sparse SVMs
    Saradhi, V. Vijaya
    Karnick, Harish
    NEUROCOMPUTING, 2008, 72 (1-3) : 659 - 663
  • [6] Bias-variance decomposition of overparameterized regression with random linear features
    Rocks, Jason W.
    Mehta, Pankaj
    PHYSICAL REVIEW E, 2022, 106 (02)
  • [7] On Bias-Variance Analysis for Probabilistic Logic Models
    Lodhi, Huma
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2008, 1 (03) : 27 - 40
  • [8] On the stability and bias-variance analysis of kernel matrix learning
    Saradhi, V. Vijaya
    Karnick, Harish
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2007, 4509 : 441 - +
  • [9] Bias-variance analysis for controlling adaptive surface meshes
    Wilson, RC
    Hancock, ER
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2000, 77 (01) : 25 - 47
  • [10] Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
    Hallak, Assaf
    Tamar, Aviv
    Munos, Remi
    Mannor, Shie
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1631 - 1637