E-Tree: An Efficient Indexing Structure for Ensemble Models on Data Streams

被引:38
|
作者
Zhang, Peng [1 ]
Zhou, Chuan [1 ]
Wang, Peng [1 ]
Gao, Byron J. [1 ,2 ]
Zhu, Xingquan [3 ]
Guo, Li
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100093, Peoples R China
[2] Texas State Univ, Dept Comp Sci, San Marcos, TX 78666 USA
[3] Florida Atlantic Univ, Dept Comp & Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
基金
中国国家自然科学基金;
关键词
Stream data mining; classification; ensemble learning; spatial indexing; concept drifting;
D O I
10.1109/TKDE.2014.2298018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensemble learning is a common tool for data stream classification, mainly because of its inherent advantages of handling large volumes of stream data and concept drifting. Previous studies, to date, have been primarily focused on building accurate ensemble models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real-world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can be automatically updated by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Theoretical analysis and empirical studies on both synthetic and real-world data streams demonstrate the performance of our approach.
引用
收藏
页码:461 / 474
页数:14
相关论文
共 50 条
  • [21] An Efficient Data Retrieval Indexing Structure for Wireless Broadcasting System
    Lin, Lien-Fa
    Chen, Chao-Chun
    ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 2, PROCEEDINGS, 2008, : 651 - +
  • [22] Perfect KDB-tree:: A compact KDB-tree structure for indexing multidimensional data
    Lin, HY
    Huang, PW
    THIRD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2005, : 411 - 414
  • [23] Q plus Tree: An Efficient Quad Tree based Data Indexing for Parallelizing Dynamic and Reverse Skylines
    Islam, Md. Saiful
    Liu, Chengfei
    Rahayu, Wenny
    Anwar, Tarique
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1291 - 1300
  • [24] A-Tree: A Dynamic Data Structure for Efficiently Indexing Arbitrary Boolean Expressions
    Ji, Shuping
    Jacobsen, Hans-Arno
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 817 - 829
  • [25] SPPC: a new tree structure for mining erasable patterns in data streams
    Le, Tuong
    Vo, Bay
    Fournier-Viger, Philippe
    Lee, Mi Young
    Baik, Sung Wook
    APPLIED INTELLIGENCE, 2019, 49 (02) : 478 - 495
  • [26] Evaluation of a dynamic tree structure for indexing query regions on streaming geospatial data
    Hart, Q
    Gertz, M
    Zhang, J
    ADVANCES IN SPATIAL AND TEMPORAL DATABASES, PROCEEDINGS, 2005, 3633 : 145 - 162
  • [27] DSTree: A tree structure for the mining of frequent sets from data streams
    Leung, Carson Kai-Sang
    Khan, Quamrul I.
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 928 - +
  • [28] The Po-tree:: a real-time spatiotemporal data indexing structure
    Noël, G
    Servigne, S
    Laurini, R
    DEVELOPMENTS IN SPATIAL DATA HANDLING, 2005, : 259 - 270
  • [29] SPPC: a new tree structure for mining erasable patterns in data streams
    Tuong Le
    Bay Vo
    Philippe Fournier-Viger
    Mi Young Lee
    Sung Wook Baik
    Applied Intelligence, 2019, 49 : 478 - 495
  • [30] CAM2S: An Integrated Indexing Structure for Spatial Objects Generating Data Streams
    Gorawski, Marcin
    Malczok, Rafal
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, : 33 - 40