E-Tree: An Efficient Indexing Structure for Ensemble Models on Data Streams

被引:38
|
作者
Zhang, Peng [1 ]
Zhou, Chuan [1 ]
Wang, Peng [1 ]
Gao, Byron J. [1 ,2 ]
Zhu, Xingquan [3 ]
Guo, Li
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100093, Peoples R China
[2] Texas State Univ, Dept Comp Sci, San Marcos, TX 78666 USA
[3] Florida Atlantic Univ, Dept Comp & Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
基金
中国国家自然科学基金;
关键词
Stream data mining; classification; ensemble learning; spatial indexing; concept drifting;
D O I
10.1109/TKDE.2014.2298018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensemble learning is a common tool for data stream classification, mainly because of its inherent advantages of handling large volumes of stream data and concept drifting. Previous studies, to date, have been primarily focused on building accurate ensemble models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real-world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can be automatically updated by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Theoretical analysis and empirical studies on both synthetic and real-world data streams demonstrate the performance of our approach.
引用
收藏
页码:461 / 474
页数:14
相关论文
共 50 条
  • [31] Highly Efficient Indexing Scheme for k-Dominant Skyline Processing over Uncertain Data Streams
    Lai, Chuan-Chi
    Lin, Hsuan-Yu
    Liu, Chuan-Ming
    2021 30TH WIRELESS AND OPTICAL COMMUNICATIONS CONFERENCE (WOCC 2021), 2021, : 97 - 101
  • [32] An efficient method for version control of a tree data structure
    Choi, EJ
    Kwon, YR
    SOFTWARE-PRACTICE & EXPERIENCE, 1997, 27 (07): : 797 - 811
  • [33] Efficient method for version control of a tree data structure
    Choi, Esther Jinee
    Kwon, Yong Rae
    Software - Practice and Experience, 1997, 27 (07): : 797 - 811
  • [34] An efficient and sensitive decision tree approach to mining concept-drifting data streams
    Tsai, Cheng-Jurig
    Lee, Chien-I
    Yang, Wei-Pang
    INFORMATICA, 2008, 19 (01) : 135 - 156
  • [35] The BoND-Tree: An Efficient Indexing Method for Box Queries in Nonordered Discrete Data Spaces
    Chen, Changqing
    Watve, Alok
    Pramanik, Sakti
    Zhu, Qiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (11) : 2629 - 2643
  • [36] HDG-Tree: A Structure for Clustering High-Dimensional Data Streams
    Ren, Jiadong
    Li, Lining
    Xia, Yan
    Ren, Jiadong
    2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 2, PROCEEDINGS, 2009, : 594 - +
  • [37] UT-Tree: Efficient mining of high utility itemsets from data streams
    Feng, Lin
    Wang, Le
    Jin, Bo
    INTELLIGENT DATA ANALYSIS, 2013, 17 (04) : 585 - 602
  • [38] OPTIMUM FITTING OF TREE STRUCTURE MODELS OF DISSIMILARITY DATA
    CARROLL, JD
    CHANG, JJ
    BIOMETRICS, 1975, 31 (02) : 591 - 591
  • [39] Data-driven tree structure for PIN models
    Lin, Emily
    Kao, Chu-Lan Michael
    Adityarini, Natasha Sonia
    REVIEW OF QUANTITATIVE FINANCE AND ACCOUNTING, 2021, 57 (02) : 411 - 427
  • [40] Data-driven tree structure for PIN models
    Emily Lin
    Chu-Lan Michael Kao
    Natasha Sonia Adityarini
    Review of Quantitative Finance and Accounting, 2021, 57 : 411 - 427