On weak base hypotheses and their implications for boosting regression and classification

被引:24
|
作者
Jiang, WX [1 ]
机构
[1] Northwestern Univ, Dept Stat, Evanston, IL 60208 USA
来源
ANNALS OF STATISTICS | 2002年 / 30卷 / 01期
关键词
angular span; boosting; classification; error bounds; least squares regression; matching pursuit; nearest neighbor rule; overfit; prediction error; regularization; training error; weak hypotheses;
D O I
10.1214/aos/1015362184
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
When studying the training error and the prediction error for boosting, it is often assumed that the hypotheses returned by the base learner are weakly accurate, or are able to beat a random guesser by a certain amount of difference. It has been an open question how much this difference can be. whether it will eventually disappear in the boosting process or be bounded by a positive amount. This question is crucial for the behavior of both the training error and the prediction error. In this paper we study this problem and show affirmatively that the amount of improvement over the random guesser will be at least a positive amount for almost all possible sample realizations and for most of the commonly used base hypotheses. This has a number of implications for the prediction error, including, for example, that boosting forever may not be good and regularization may be necessary. The problem is studied by first considering an analog of AdaBoost in regression, where we study similar properties and find that, for good performance, one cannot hope to avoid regularization by just adopting the boosting device to regression.
引用
收藏
页码:51 / 73
页数:23
相关论文
共 50 条
  • [21] Boosting ridge for the extreme learning machine globally optimised for classification and regression problems
    Peralez-Gonzalez, Carlos
    Perez-Rodriguez, Javier
    Duran-Rosal, Antonio M.
    SCIENTIFIC REPORTS, 2023, 13 (01):
  • [22] Application of Boosting Classification and Regression to Modeling the Relationships Between Trace Elements and Diseases
    Chao Tan
    Hui Chen
    Wanping Zhu
    Biological Trace Element Research, 2010, 134 : 146 - 159
  • [23] Application of Boosting Classification and Regression to Modeling the Relationships Between Trace Elements and Diseases
    Tan, Chao
    Chen, Hui
    Zhu, Wanping
    BIOLOGICAL TRACE ELEMENT RESEARCH, 2010, 134 (02) : 146 - 159
  • [24] MULTIPLE RANK REGRESSION BASE METHOD FOR THE CLASSIFICATION OF SAR IMAGES
    Ma, CongHui
    Wen, GongJian
    Huang, XiaoHong
    Yang, XiaoLiang
    Ding, BaiYuan
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 1532 - 1535
  • [25] Boosting regression methods based on a geometric conversion approach: Using SVMs base learners
    Gao, Feng
    Kou, Peng
    Gao, Lin
    Guan, Kiaohong
    NEUROCOMPUTING, 2013, 113 : 67 - 87
  • [26] Boosting methods for regression
    Duffy, N
    Helmbold, D
    MACHINE LEARNING, 2002, 47 (2-3) : 153 - 200
  • [27] Boosting ridge regression
    Tutz, Gerhard
    Binder, Harald
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (12) : 6044 - 6059
  • [28] Boosting regression estimators
    Avnimelech, R
    Intrator, N
    NEURAL COMPUTATION, 1999, 11 (02) : 499 - 520
  • [29] Boosting Methods for Regression
    Nigel Duffy
    David Helmbold
    Machine Learning, 2002, 47 : 153 - 200
  • [30] On boosting kernel regression
    Di Marzio, Marco
    Taylor, Charles C.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2008, 138 (08) : 2483 - 2498