Tree induction for probability-based ranking

被引:312
|
作者
Provost, F [1 ]
Domingos, P
机构
[1] NYU, New York, NY 10012 USA
[2] Univ Washington, Seattle, WA 98195 USA
关键词
ranking; probability estimation; classification; cost-sensitive learning; decision trees; Laplace correction; bagging;
D O I
10.1023/A:1024099825458
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tree induction is one of the most effective and widely used methods for building classification models. However, many applications require cases to be ranked by the probability of class membership. Probability estimation trees (PETs) have the same attractive features as classification trees ( e. g., comprehensibility, accuracy and efficiency in high dimensions and on large data sets). Unfortunately, decision trees have been found to provide poor probability estimates. Several techniques have been proposed to build more accurate PETs, but, to our knowledge, there has not been a systematic experimental analysis of which techniques actually improve the probability-based rankings, and by how much. In this paper we first discuss why the decision-tree representation is not intrinsically inadequate for probability estimation. Inaccurate probabilities are partially the result of decision-tree induction algorithms that focus on maximizing classification accuracy and minimizing tree size ( for example via reduced-error pruning). Larger trees can be better for probability estimation, even if the extra size is superfluous for accuracy maximization. We then present the results of a comprehensive set of experiments, testing some straightforward methods for improving probability-based rankings. We show that using a simple, common smoothing method - the Laplace correction - uniformly improves probability-based rankings. In addition, bagging substantially improves the rankings, and is even more effective for this purpose than for improving accuracy. We conclude that PETs, with these simple modifications, should be considered when rankings based on class-membership probability are required.
引用
收藏
页码:199 / 215
页数:17
相关论文
共 50 条
  • [41] Probability-based assessment of liquefaction risk potential
    Lee, DH
    Ku, CS
    Yuan, H
    Juang, CH
    APPLICATIONS OF STATISTICS AND PROBABILITY IN CIVIL ENGINEERING, VOLS 1 AND 2, 2003, : 1323 - 1329
  • [42] The probability-based granular field of vegetated soils
    Zhang, Jun
    Li, Yong
    Liu, Daochuan
    Jiang, Ning
    Yang, Taiqiang
    Guo, Xiaojun
    Yao, Yingjie
    EARTH SURFACE PROCESSES AND LANDFORMS, 2022, 47 (13) : 3100 - 3116
  • [43] A Probability-based Approach to Attack Graphs Generation
    Xie, Anming
    Zhang, Li
    Hu, Jianbin
    Chen, Zhong
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, VOL II, 2009, : 343 - 347
  • [44] Probability-based methods for quantifying nonlinearity in the ENSO
    Hannachi, A
    Stephenson, DB
    Sperber, KR
    CLIMATE DYNAMICS, 2003, 20 (2-3) : 241 - 256
  • [45] OPTIMUM PROBABILITY-BASED DESIGN OF PLASTIC STRUCTURES
    FRANGOPOL, D
    RONDAL, J
    ENGINEERING OPTIMIZATION, 1977, 3 (01) : 17 - 25
  • [46] Probability-based bus headway regularity measure
    Lin, J.
    Ruan, M.
    IET INTELLIGENT TRANSPORT SYSTEMS, 2009, 3 (04) : 400 - 408
  • [47] Probability-Based Synthetic Minority Oversampling Technique
    Altwaijry, Najwa
    IEEE ACCESS, 2023, 11 : 28831 - 28839
  • [48] Probability-based wind-wave relation
    Gao, Yang
    Schmitt, Francois G.
    Hu, Jianyu
    Huang, Yongxiang
    FRONTIERS IN MARINE SCIENCE, 2023, 9
  • [49] Probability-based LRFD for engineered wood construction
    Ellingwood, BR
    STRUCTURAL SAFETY, 1997, 19 (01) : 53 - 65
  • [50] Probability-based locally linear embedding for classification
    Zhang, Zhenyue
    Zhao, Lingxiao
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 243 - +