Online Clustering for Novelty Detection and Concept Drift in Data Streams

被引:6
|
作者
Garcia, Kemilly Dearo [1 ,2 ]
Poel, Mannes [1 ]
Kok, Joost N. [1 ]
de Carvalho, Andre C. P. L. F. [2 ]
机构
[1] Univ Twente, Enschede, Netherlands
[2] Univ Sao Paulo, ICMC, Sao Paulo, Brazil
来源
关键词
Data stream; Concept drift; Novelty detection; Online learning; CLASSIFICATION;
D O I
10.1007/978-3-030-30244-3_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data streams are related to large amounts of data that can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, like new classes can appear or concept drift can occur in existing classes. Machine Learning algorithms have been often used to model this data. New classes are patterns that were not seen during the training of the current classification model, but appear after some time. Concept drift occurs when the concepts associated with a dataset change as new data arrive. This paper proposes a new algorithm based on kNN that uses micro-clusters as prototypes and incrementally updates the micro-clusters or creates new micro-clusters when novelties are detected. In the online phase, each instance close to a micro-cluster is considered an extension of the micro-cluster, being used to adapt the model to concept drift. The proposed algorithm is experimentally compared with a stateof-the-art classifier from the data stream literature and one baseline. According to the experimental results, the proposed algorithm increases the predictive performance over time by incrementally learning changes in the data distribution.
引用
收藏
页码:448 / 459
页数:12
相关论文
共 50 条
  • [41] Calculating Feature Importance in Data Streams with Concept Drift using Online Random Forest
    Cassidy, Andrew Phelps
    Deviney, Frank A., Jr.
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [42] Online clustering of parallel data streams
    Beringer, Juergen
    Huellermeier, Eyke
    DATA & KNOWLEDGE ENGINEERING, 2006, 58 (02) : 180 - 204
  • [43] ENSEMBLE ALGORITHM FOR DATA STREAMS WITH CONCEPT DRIFT
    Tase, R. O. R.
    Cabrera, A. V.
    Naranjo, D. L. O.
    Diaz, A. A. O.
    Blanco, I. F.
    HOLOS, 2016, 32 (02) : 24 - 36
  • [44] AUC Estimation and Concept Drift Detection for Imbalanced Data Streams with Multiple Classes
    Wang, Shuo
    Minku, Leandro L.
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [45] Concept Drift Detection in Streams of Labelled Data Using the Restricted Boltzmann Machine
    Jaworski, Maciej
    Duda, Piotr
    Rutkowski, Leszek
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [46] A novel concept drift detection method in data streams using ensemble classifiers
    Dehghan, Mahdie
    Beigy, Hamid
    ZareMoodi, Poorya
    INTELLIGENT DATA ANALYSIS, 2016, 20 (06) : 1329 - 1350
  • [47] Concept Drift Detection from Multi-Class Imbalanced Data Streams
    Korycki, Lukasz
    Krawczyk, Bartosz
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1068 - 1079
  • [48] Possibilistic Approach For Novelty Detection In Data Streams
    da Silva, Tiago Pinho
    Camargo, Heloisa de Arruda
    2020 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2020,
  • [49] PrefixCDD: Effective Online Concept Drift Detection over Event Streams using Prefix Trees
    Huete, Jesus
    Qahtan, Abdulhakim A.
    Hassani, Marwan
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 328 - 333
  • [50] Novelty-aware concept drift detection for neural networks
    Shang, Dan
    Zhang, Guangquan
    Lu, Jie
    NEUROCOMPUTING, 2025, 617