Efficient Online Evaluation of Big Data Stream Classifiers

被引:110
|
作者
Bifet, Albert [1 ]
Morales, Gianmarco De Francisci [2 ]
Read, Jesse [3 ]
Holmes, Geoff [4 ]
Pfahringer, Bernhard [4 ]
机构
[1] HUAWEI, Noahs Ark Lab, Hong Kong, Peoples R China
[2] Aalto Univ, Helsinki, Finland
[3] Aalto Univ, HIIT, Helsinki, Finland
[4] Univ Waikato, Hamilton, New Zealand
关键词
Data Streams; Evaluation; Online Learning; Classification; CLASSIFICATION; AGREEMENT;
D O I
10.1145/2783258.2783372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.
引用
收藏
页码:59 / 68
页数:10
相关论文
共 50 条
  • [21] A Framework for the Efficient Collection of Big Data from Online Social Networks
    Petrillo, Umberto Ferraro
    Consolo, Stefano
    2014 INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS), 2014, : 34 - 41
  • [22] A Comparative Analysis of Stream Data Classifiers and Conventional Classifiers for Anomaly Intrusion Detection
    Kumari, S. Ranjitha
    Kumari, P. Krishna
    ADVANCED SCIENCE LETTERS, 2015, 21 (10) : 3300 - 3304
  • [23] Data stream classification and big data analytics
    Krawczyk, Bartosz
    Wozniak, Michal
    Stefanowski, Jerzy
    NEUROCOMPUTING, 2015, 150 : 238 - 239
  • [24] Performance Evaluation of Machine Learning Classifiers for Stock Market Prediction in Big Data Environment
    Kalra, Sneh
    Gupta, Sachin
    Prasad, Jay Shankar
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (05): : 295 - 306
  • [25] ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering (Extended Abstract)
    Li, Yanni
    Li, Hui
    Wang, Zhi
    Liu, Bing
    Cui, Jiangtao
    Fei, Hang
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2329 - +
  • [26] IoT Big Data Stream Mining
    Morales, Gianmarco De Francisci
    Bifet, Albert
    Khan, Latifur
    Gama, Joao
    Fan, Wei
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 2119 - 2120
  • [27] Big Data Stream Learning with SAMOA
    Bifet, Albert
    De Francisci Morales, Gianmarco
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 1199 - 1202
  • [28] Incremental Weighted Naive Bays Classifiers for Data Stream
    Salperwyck, Christophe
    Lemaire, Vincent
    Hue, Carine
    DATA SCIENCE, LEARNING BY LATENT STRUCTURES, AND KNOWLEDGE DISCOVERY, 2015, : 179 - 190
  • [29] Distance variable improvement of time-series big data stream evaluation
    Wibisono, Ari
    Mursanto, Petrus
    Adibah, Jihan
    Bayu, Wendy D. W. T.
    Rizki, May Iffah
    Hasani, Lintang Matahari
    Ahli, Valian Fil
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [30] Distance variable improvement of time-series big data stream evaluation
    Ari Wibisono
    Petrus Mursanto
    Jihan Adibah
    Wendy D. W. T. Bayu
    May Iffah Rizki
    Lintang Matahari Hasani
    Valian Fil Ahli
    Journal of Big Data, 7