Predicting nearly as well as the best pruning of a decision tree

被引:70
|
作者
Helmbold, DP [1 ]
Schapire, RE [1 ]
机构
[1] AT&T BELL LABS,MURRAY HILL,NJ 07974
关键词
decision trees; pruning; prediction; on-line learning;
D O I
10.1023/A:1007396710653
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many algorithms for inferring a decision tree from data involve a two-phase process: First, a very large decision tree is grown which typically ends up ''over-fitting'' the data. To reduce over-fitting, in the second phase, the tree is pruned using one of a number of available methods. The final tree is then output and used for classification on rest data. In this paper, we suggest an alternative approach to the pruning phase. Using a given unpruned decision tree, we present a new method of making predictions on test data, and we prove that our algorithm's performance will not be ''much worse'' (in a precise technical sense) than the predictions made by the best reasonably small pruning of the given decision tree. Thus, our procedure is guaranteed to be competitive (in terms of the quality of its predictions) with any pruning algorithm. We prove that our procedure is very efficient and highly robust. Our method can be viewed as a synthesis of two previously studied techniques. First, we apply Cesa-Bianchi et al.'s (1993) results on predicting using ''expert advice'' (where we view each pruning as an ''expert'') to obtain an algorithm that has provably low prediction loss, but that is computationally infeasible. Next, we generalize and apply a method developed by Buntine (1990, 1992) and Willems, Shtarkov and Tjalkens (1993, 1995) to derive a very efficient implementation of this procedure.
引用
收藏
页码:51 / 68
页数:18
相关论文
共 50 条
  • [31] CC4.5: cost-sensitive decision tree pruning
    Cai, J
    Durkin, J
    Cai, Q
    Data Mining VI: Data Mining, Text Mining and Their Business Applications, 2005, : 239 - 245
  • [32] Decision tree's pruning algorithm based on deficient data sets
    Zhang, Y
    Chi, ZX
    Wang, DG
    PDCAT 2005: Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies, Proceedings, 2005, : 1030 - 1032
  • [33] A self-learning algorithm for decision tree pre-pruning
    Yin, DS
    Wang, GY
    Wu, Y
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 2140 - 2145
  • [34] Fast and reliable color region merging inspired by decision tree pruning
    Nock, R
    2001 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2001, : 271 - 276
  • [35] A Comparative Study of Reduced Error Pruning Method in Decision Tree Algorithms
    Omar, W. Mohd Abdul
    Salleh, Mohd Najib Mohd
    Omar, Abdul Halim
    2012 IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2012), 2012, : 392 - 397
  • [36] Predicting students' satisfaction using a decision tree
    Skrbinjek, Vesna
    Dermol, Valerij
    TERTIARY EDUCATION AND MANAGEMENT, 2019, 25 (02) : 101 - 113
  • [37] Predicting students’ satisfaction using a decision tree
    Vesna Skrbinjek
    Valerij Dermol
    Tertiary Education and Management, 2019, 25 : 101 - 113
  • [38] Predicting Protein Function using Decision Tree
    Singh, Manpreet
    Wadhwa, Parminder Kaur
    Kaur, Surinder
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 29, 2008, 29 : 350 - +
  • [39] Predicting cesarean delivery with decision tree models
    Sims, CJ
    Meyn, L
    Caruana, R
    Rao, RB
    Mitchell, T
    Krohn, M
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2000, 183 (05) : 1198 - 1206
  • [40] BEST: a decision tree algorithm that handles missing values
    Cédric Beaulac
    Jeffrey S. Rosenthal
    Computational Statistics, 2020, 35 : 1001 - 1026