Confidence Decision Trees via Online and Active Learning for Streaming Data

被引:2
|
作者
De Rosa, Rocco [1 ]
Cesa-Bianchi, Nicolo [1 ]
机构
[1] Univ Milan, Dipartimento Informat, I-20135 Milan, Italy
关键词
INEQUALITIES;
D O I
10.1613/jair.5440
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify splitting a leaf. Although some of the issues in the statistical analysis of Hoeffding trees have been already clarified, a general and rigorous study of confidence intervals for splitting criteria is missing. We fill this gap by deriving accurate confidence intervals to estimate the splitting gain in decision tree learning with respect to three criteria: entropy, Gini index, and a third index proposed by Kearns and Mansour. We also extend our confidence analysis to a selective sampling setting, in which the decision tree learner adaptively decides which labels to query in the stream. We provide theoretical guarantees bounding the probability that the decision tree learned via our selective sampling strategy classifies suboptimally the next example in the stream. Experiments on real and synthetic data in a streaming setting show that our trees are indeed more accurate than trees with the same number of leaves generated by state-of-the-art techniques. In addition to that, our active learning module empirically uses fewer labels without significantly hurting the performance.
引用
收藏
页码:1031 / 1055
页数:25
相关论文
共 50 条
  • [1] Online active learning of decision trees with evidential data
    Ma, Liyao
    Destercke, Sebastien
    Wang, Yong
    PATTERN RECOGNITION, 2016, 52 : 33 - 45
  • [2] Incremental Learning of Fuzzy Decision Trees for Streaming Data Classification
    Pecori, Riccardo
    Ducange, Pietro
    Marcelloni, Francesco
    PROCEEDINGS OF THE 11TH CONFERENCE OF THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY (EUSFLAT 2019), 2019, 1 : 748 - 755
  • [3] Streaming Decision Trees for Lifelong Learning
    Korycki, Lukasz
    Krawczyk, Bartosz
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 502 - 518
  • [4] TRUNCATED GRADIENT CONFIDENCE -WEIGHTED BASED ONLINE LEARNING FOR IMBALANCE STREAMING DATA
    Hu, Ji
    Yan, Chenggang
    Liu, Xin
    Zhang, Jiyong
    Peng, Dongliang
    Yang, Yi
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 133 - 138
  • [5] A new online learning algorithm for streaming data and decision support with a Bayesian approach
    Huang, Kai
    Weng, Jiaying
    Wang, Chao
    Li, Mingfei
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2024, 94 (11) : 2483 - 2499
  • [6] Online Learning from Streaming Data
    Hawkins, Jeff
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1915 - 1915
  • [7] Online transfer learning with multiple decision trees
    Yimin Wen
    Yixiu Qin
    Keke Qin
    Xiaoxia Lu
    Pingshan Liu
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 2941 - 2962
  • [8] Online Learning of Decision Trees with Thompson Sampling
    Chaouki, Ayman
    Read, Jesse
    Bifet, Albert
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [9] Active Learning with Evolving Streaming Data
    Zliobaite, Indre
    Bifet, Albert
    Pfahringer, Bernhard
    Holmes, Geoff
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2011, 6913 : 597 - 612
  • [10] Online transfer learning with multiple decision trees
    Wen, Yimin
    Qin, Yixiu
    Qin, Keke
    Lu, Xiaoxia
    Liu, Pingshan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (10) : 2941 - 2962