E-Tree: An Efficient Indexing Structure for Ensemble Models on Data Streams

被引:38
|
作者
Zhang, Peng [1 ]
Zhou, Chuan [1 ]
Wang, Peng [1 ]
Gao, Byron J. [1 ,2 ]
Zhu, Xingquan [3 ]
Guo, Li
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing 100093, Peoples R China
[2] Texas State Univ, Dept Comp Sci, San Marcos, TX 78666 USA
[3] Florida Atlantic Univ, Dept Comp & Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
基金
中国国家自然科学基金;
关键词
Stream data mining; classification; ensemble learning; spatial indexing; concept drifting;
D O I
10.1109/TKDE.2014.2298018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensemble learning is a common tool for data stream classification, mainly because of its inherent advantages of handling large volumes of stream data and concept drifting. Previous studies, to date, have been primarily focused on building accurate ensemble models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real-world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can be automatically updated by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Theoretical analysis and empirical studies on both synthetic and real-world data streams demonstrate the performance of our approach.
引用
收藏
页码:461 / 474
页数:14
相关论文
共 50 条
  • [1] An efficient peer-to-peer indexing tree structure for multidimensional data
    Zhang, Rong
    Qian, Weining
    Zhou, Aoying
    Zhou, Minqi
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2009, 25 (01): : 77 - 88
  • [2] Quantized Indexing Tree for Frequent Updates over Data Streams
    Su, Liang
    Wang, Bo
    Zou, Peng
    Jia, Yan
    Zuo, Ke
    Yang, ShuQiang
    20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 1, PROCEEDINGS, 2008, : 533 - 538
  • [3] An efficient tree structure for indexing feature vectors
    The-Anh Pham
    Barrat, Sabine
    Delalandre, Mathieu
    Ramel, Jean-Yves
    PATTERN RECOGNITION LETTERS, 2015, 55 : 42 - 50
  • [4] SC-tree: An efficient structure for high-dimensional data indexing
    Wang, Ben
    Gan, John Q.
    FLEXIBLE AND EFFICIENT INFORMATION HANDLING, 2006, 4042 : 164 - 176
  • [5] Dynamic Dimension Indexing for Efficient Skyline Maintenance on Data Streams
    Liu, Rui
    Li, Dominique
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 272 - 287
  • [6] Summary Prefix Tree: An over DHT Indexing Data Structure for Efficient Superset Search
    Ngom, Bassirou
    Makpangou, Mesaac
    2017 IEEE 16TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2017, : 323 - 327
  • [7] An Efficient Ensemble Method for Classifying Skewed Data Streams
    Zhang, Juan
    Hu, Xuegang
    Zhang, Yuhong
    Li, Peipei
    BIO-INSPIRED COMPUTING AND APPLICATIONS, 2012, 6840 : 144 - 151
  • [8] Tree Based Fast Similarity Query Search Indexing on Outsourced Cloud Data Streams
    Balasubramanian, Balamurugan
    Durai, Kamalraj
    Sathyanarayanan, Jegadeeswari
    Muthukumarasamy, Sugumaran
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 871 - 878
  • [9] FA-Tree - A dynamic indexing structure for spatial data
    Chang, CC
    Shen, JJ
    Chou, YC
    SOFT COMPUTING AS TRANSDISCIPLINARY SCIENCE AND TECHNOLOGY, 2005, : 1071 - 1080
  • [10] XR-tree: Indexing XML data for efficient structural joins
    Jiang, HF
    Lu, HJ
    Wang, W
    Ooi, BC
    19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 253 - 264