Tracking clusters and anomalies in evolving data streams

被引:4
|
作者
Guggilam, Sreelekha [1 ]
Chandola, Varun [1 ,2 ]
Patra, Abani [1 ,3 ]
机构
[1] Univ Buffalo State Univ New York SUNY, Computat Data Sci & Engn, Buffalo, NY 14260 USA
[2] Univ Buffalo State Univ New York SUNY, Comp Sci & Engn, Buffalo, NY USA
[3] Tufts Univ, Data Intens Studies Ctr, Medford, MA 02155 USA
基金
美国国家科学基金会;
关键词
anomaly detection; Bayesian nonparametric models; clustering-based anomaly detection; evolving stream data; extreme value theory; EXTREME-VALUE THEORY; ALGORITHMS;
D O I
10.1002/sam.11552
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-driven anomaly detection methods typically build a model for the normal behavior of the target system, and score each data instance with respect to this model. A threshold is invariably needed to identify data instances with high (or low) scores as anomalies. This presents a practical limitation on the applicability of such methods, since most methods are sensitive to the choice of the threshold, and it is challenging to set optimal thresholds. The issue is exacerbated in a streaming scenario, where the optimal thresholds vary with time. We present a probabilistic framework to explicitly model the normal and anomalous behaviors and probabilistically reason about the data. An extreme value theory based formulation is proposed to model the anomalous behavior as the extremes of the normal behavior. As a specific instantiation, a joint nonparametric clustering and anomaly detection algorithm (INCAD) is proposed that models the normal behavior as a Dirichlet process mixture model. Results on a variety of datasets, including streaming data, show that the proposed method provides effective and simultaneous clustering and anomaly detection without requiring strong initialization and threshold parameters.
引用
收藏
页码:156 / 178
页数:23
相关论文
共 50 条
  • [41] Gradient boosted trees for evolving data streams
    Nuwan Gunasekara
    Bernhard Pfahringer
    Heitor Gomes
    Albert Bifet
    Machine Learning, 2024, 113 : 3325 - 3352
  • [42] Gradient boosted trees for evolving data streams
    Gunasekara, Nuwan
    Pfahringer, Bernhard
    Gomes, Heitor
    Bifet, Albert
    MACHINE LEARNING, 2024, 113 (05) : 3325 - 3352
  • [43] Logistic Regression for Evolving Data Streams Classification
    尹志武
    黄上腾
    薛贵荣
    JournalofShanghaiJiaotongUniversity, 2007, (02) : 197 - 203
  • [44] Mining evolving data streams for frequent patterns
    Laur, Pierre-Alain
    Nock, Richard
    Symphor, Jean-Emile
    Poncelet, Pascal
    PATTERN RECOGNITION, 2007, 40 (02) : 492 - 503
  • [45] Improved Ensemble Classification for Evolving Data Streams
    Tian, Hui
    Wang, Lulu
    Shen, Hong
    Liew, Alan Wee-Chung
    IEEE INTELLIGENT SYSTEMS, 2022, 37 (01) : 38 - 50
  • [46] Incremental Rebalancing Learning on Evolving Data Streams
    Bernardo, Alessio
    Valle, Emanuele Della
    Bifet, Albert
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 844 - 850
  • [47] Searching for Unknown Anomalies in Hierarchical Data Streams
    Gafni, Tomer
    Cohen, Kobi
    Zhao, Qing
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1774 - 1778
  • [48] Tracking Drift Severity in Data Streams
    Chen, Kylie
    Koh, Yun Sing
    Riddle, Patricia
    AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 96 - 108
  • [49] Tracking Time-Evolving Data Streams and an Application to Short-Term Urban Traffic Flow Forecasting
    Masulli, Francesco
    2016 5TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO), 2016, : 37 - 37
  • [50] Detecting Anomalies in Dismount Tracking Data
    Zelnio, Holly
    PROCEEDINGS OF THE 2019 IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE (NAECON), 2019, : 80 - 87