Efficient Online Evaluation of Big Data Stream Classifiers

被引:110
|
作者
Bifet, Albert [1 ]
Morales, Gianmarco De Francisci [2 ]
Read, Jesse [3 ]
Holmes, Geoff [4 ]
Pfahringer, Bernhard [4 ]
机构
[1] HUAWEI, Noahs Ark Lab, Hong Kong, Peoples R China
[2] Aalto Univ, Helsinki, Finland
[3] Aalto Univ, HIIT, Helsinki, Finland
[4] Univ Waikato, Hamilton, New Zealand
关键词
Data Streams; Evaluation; Online Learning; Classification; CLASSIFICATION; AGREEMENT;
D O I
10.1145/2783258.2783372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.
引用
收藏
页码:59 / 68
页数:10
相关论文
共 50 条
  • [1] Online Classification Algorithm for Uncertain Data Stream in Big Data
    Lyu Y.X.
    Wang C.R.
    Wang C.
    Yu C.Y.
    Lyu, Yan Xia (shaoqilyx@163.com), 1600, Northeast University (37): : 1245 - 1249
  • [2] Experimental evaluation of ensemble classifiers for imbalance in Big Data
    Juez-Gil M.
    Arnaiz-González Á.
    Rodríguez J.J.
    García-Osorio C.
    Applied Soft Computing, 2021, 108
  • [3] Efficient Online Big Data Stream Clustering Using Dual Interactive Wasserstein Generative Adversarial Network
    Matheswaran, Suresh
    Nachimuthu, Nandhagopal
    Prakash, G.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2024, 33 (05)
  • [4] Time-Series Big Data Stream Evaluation
    Mursanto, Petrus
    Wibisono, Ari
    Bayu, Wendy D. W. T.
    Ahli, Valian Fil
    Rizki, May Iffah
    Hasani, Lintang Matahari
    Adibah, Jihan
    2020 5TH INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2020), 2020, : 43 - 47
  • [5] Research on Efficient Stream Cipher Design in Big Data Environment
    Liu, Shilin
    Jin, Zhexue
    Li, Yongzhen
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 36 - 43
  • [6] Optimal and Efficient Distributed Online Learning for Big Data
    Sayin, Muhammed O.
    Vanli, N. Denizcan
    Delibalta, Ibrahim
    Kozat, Suleyman S.
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 126 - 133
  • [7] Data fusion in automotive applications Efficient big data stream computing approach
    Haroun, Amir
    Mostefaoui, Ahmed
    Dessables, Francois
    PERSONAL AND UBIQUITOUS COMPUTING, 2017, 21 (03) : 443 - 455
  • [8] ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering
    Li, Yanni
    Li, Hui
    Wang, Zhi
    Liu, Bing
    Cui, Jiangtao
    Fei, Hang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 617 - 630
  • [9] DPBSV- An Efficient and Secure Scheme for Big Sensing Data Stream
    Puthal, Deepak
    Nepal, Surya
    Ranjan, Rajiv
    Chen, Jinjun
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 1, 2015, : 246 - 253
  • [10] Accelerated multi-task online learning algorithm for big data stream
    Li, Zhijie
    Li, Yuanxiang
    Wang, Feng
    Kuang, Li
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (11): : 2545 - 2554