Confidence Decision Trees via Online and Active Learning for Streaming Data

被引:2
|
作者
De Rosa, Rocco [1 ]
Cesa-Bianchi, Nicolo [1 ]
机构
[1] Univ Milan, Dipartimento Informat, I-20135 Milan, Italy
关键词
INEQUALITIES;
D O I
10.1613/jair.5440
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify splitting a leaf. Although some of the issues in the statistical analysis of Hoeffding trees have been already clarified, a general and rigorous study of confidence intervals for splitting criteria is missing. We fill this gap by deriving accurate confidence intervals to estimate the splitting gain in decision tree learning with respect to three criteria: entropy, Gini index, and a third index proposed by Kearns and Mansour. We also extend our confidence analysis to a selective sampling setting, in which the decision tree learner adaptively decides which labels to query in the stream. We provide theoretical guarantees bounding the probability that the decision tree learned via our selective sampling strategy classifies suboptimally the next example in the stream. Experiments on real and synthetic data in a streaming setting show that our trees are indeed more accurate than trees with the same number of leaves generated by state-of-the-art techniques. In addition to that, our active learning module empirically uses fewer labels without significantly hurting the performance.
引用
收藏
页码:1031 / 1055
页数:25
相关论文
共 50 条
  • [21] Elastic online deep learning for dynamic streaming data
    Su, Rui
    Guo, Husheng
    Wang, Wenjian
    INFORMATION SCIENCES, 2024, 676
  • [22] Survey of Online Learning Algorithms for Streaming Data Classification
    Zhai T.-T.
    Gao Y.
    Zhu J.-W.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (04): : 912 - 931
  • [23] Factorized Decision Trees for Active Learning in Recommender Systems
    Karimi, Rasoul
    Wistuba, Martin
    Nanopoulos, Alexandros
    Schmidt-Thieme, Lars
    2013 IEEE 25TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2013, : 404 - 411
  • [24] Active Learning for Streaming Data in A Contextual Bandit Framework
    Song, Linqi
    Xu, Jie
    Li, Congduan
    ICCDE 2019: PROCEEDINGS OF THE 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING AND DATA ENGINEERING, 2019, : 29 - 35
  • [25] Online Residual Quantization Via Streaming Data Correlation Preserving
    Li, Pandeng
    Xie, Hongtao
    Min, Shaobo
    Zha, Zheng-Jun
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 981 - 994
  • [26] Concept Drifting Detection on Noisy Streaming Data in Random Ensemble Decision Trees
    Li, Peipei
    Hu, Xuegang
    Liang, Qianghui
    Gao, Yunjun
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 236 - +
  • [27] A comparative study of simple online learning strategies for streaming data
    Universitat Jaume I, Dept. Llenguatges i Sistemes Informátics, Av. Sos Baynat s/n, 12071 Castelló de la Plana, Spain
    WSEAS Trans. Circuits Syst., 2008, 10 (900-910):
  • [28] Online Deep Learning from Doubly-Streaming Data
    Lian, Heng
    Atwood, John Scovil
    Hou, Bo-Jian
    Wu, Jian
    He, Yi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3185 - 3194
  • [29] Robust Sparse Online Learning for Data Streams with Streaming Features
    Chen, Zhong
    He, Yi
    Wu, Di
    Zhan, Huixin
    Sheng, Victor
    Zhang, Kun
    PROCEEDINGS OF THE 2024 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2024, : 181 - 189
  • [30] Bayesian Credible Intervals for Online and Active Learning of Classification Trees
    Collet, Timothe
    Pietquin, Olivier
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 571 - 578