Efficient Online Evaluation of Big Data Stream Classifiers

被引:110
|
作者
Bifet, Albert [1 ]
Morales, Gianmarco De Francisci [2 ]
Read, Jesse [3 ]
Holmes, Geoff [4 ]
Pfahringer, Bernhard [4 ]
机构
[1] HUAWEI, Noahs Ark Lab, Hong Kong, Peoples R China
[2] Aalto Univ, Helsinki, Finland
[3] Aalto Univ, HIIT, Helsinki, Finland
[4] Univ Waikato, Hamilton, New Zealand
关键词
Data Streams; Evaluation; Online Learning; Classification; CLASSIFICATION; AGREEMENT;
D O I
10.1145/2783258.2783372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.
引用
收藏
页码:59 / 68
页数:10
相关论文
共 50 条
  • [41] Evolving Big Data Stream Classification with MapReduce
    Haque, Ahsanul
    Parker, Brandon
    Khan, Latifur
    Thuraisingham, Bhavani
    2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 570 - 577
  • [42] Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments
    Sun, Dawei
    Zhang, Guangyan
    Yang, Songlin
    Meng, Weimin
    Khan, Samee U.
    Li, Keqin
    INFORMATION SCIENCES, 2015, 319 : 92 - 112
  • [43] Big data stream computing: technologies and instances
    Zhang, G.-Y. (gyzh@tsinghua.edu.cn), 1600, Chinese Academy of Sciences (25):
  • [44] Stream Processing Languages in the Big Data Era
    Hirzel, Martin
    Baudart, Guillaume
    Bonifati, Angela
    Della Valle, Emanuele
    Sakr, Sherif
    Vlachou, Akrivi
    SIGMOD RECORD, 2018, 47 (02) : 29 - 40
  • [45] A construction of online teaching quality evaluation model based on big data mining
    Li, Weijuan
    INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2024, 34 (01) : 1 - 12
  • [46] Research on the Reform and Evaluation of Efficient Teaching under the Background of Big Data
    Zhao, Na
    Tian, Hao
    2017 2ND INTERNATIONAL CONFERENCE ON EDUCATION RESEARCH AND REFORM (ERR 2017), VOL 1, 2017, 19 : 224 - 228
  • [47] An Efficient and Privacy-preserving Similarity Evaluation For Big Data Analytics
    Gheid, Zakaria
    Challal, Yacine
    2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 281 - 289
  • [48] Big Stream Processing Systems: An Experimental Evaluation
    Shahverdi, Elkhan
    Awad, Ahmed
    Sakr, Sherif
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2019), 2019, : 53 - 60
  • [49] Online course learning outcome evaluation method based on big data analysis
    Li, Hai-Jie
    Peng, Min
    INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2019, 29 (04) : 349 - 361
  • [50] Online Evaluation of Email Streaming Classifiers Using GNUsmail
    Carmona-Cejudo, Jose M.
    Baena-Garcia, Manuel
    del Campo-Avila, Jose
    Bifet, Albert
    Gama, Joao
    Morales-Bueno, Rafael
    ADVANCES IN INTELLIGENT DATA ANALYSIS X: IDA 2011, 2011, 7014 : 90 - +