Tracking clusters and anomalies in evolving data streams

被引:4
|
作者
Guggilam, Sreelekha [1 ]
Chandola, Varun [1 ,2 ]
Patra, Abani [1 ,3 ]
机构
[1] Univ Buffalo State Univ New York SUNY, Computat Data Sci & Engn, Buffalo, NY 14260 USA
[2] Univ Buffalo State Univ New York SUNY, Comp Sci & Engn, Buffalo, NY USA
[3] Tufts Univ, Data Intens Studies Ctr, Medford, MA 02155 USA
基金
美国国家科学基金会;
关键词
anomaly detection; Bayesian nonparametric models; clustering-based anomaly detection; evolving stream data; extreme value theory; EXTREME-VALUE THEORY; ALGORITHMS;
D O I
10.1002/sam.11552
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-driven anomaly detection methods typically build a model for the normal behavior of the target system, and score each data instance with respect to this model. A threshold is invariably needed to identify data instances with high (or low) scores as anomalies. This presents a practical limitation on the applicability of such methods, since most methods are sensitive to the choice of the threshold, and it is challenging to set optimal thresholds. The issue is exacerbated in a streaming scenario, where the optimal thresholds vary with time. We present a probabilistic framework to explicitly model the normal and anomalous behaviors and probabilistically reason about the data. An extreme value theory based formulation is proposed to model the anomalous behavior as the extremes of the normal behavior. As a specific instantiation, a joint nonparametric clustering and anomaly detection algorithm (INCAD) is proposed that models the normal behavior as a Dirichlet process mixture model. Results on a variety of datasets, including streaming data, show that the proposed method provides effective and simultaneous clustering and anomaly detection without requiring strong initialization and threshold parameters.
引用
收藏
页码:156 / 178
页数:23
相关论文
共 50 条
  • [21] Tracking the Evolution of Clusters in Social Media Streams
    Anwar, Tarique
    Nepal, Surya
    Paris, Cecile
    Yang, Jian
    Wu, Jia
    Sheng, Quan Z.
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (02) : 701 - 715
  • [22] Detecting Anomalies with Autoencoders on Data Streams
    Cazzonelli, Lucas
    Kulbach, Cedric
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I, 2023, 13713 : 258 - 274
  • [23] TADS: Transformation of Anomalies in Data Streams
    Green, Wyatt
    Johnsten, Tom
    Benton, Ryan G.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4284 - 4292
  • [24] A New Evolving Data Streams System With Data Fusion
    Yu Huijun
    Wang Zhigang
    Liu Xiaoyan
    2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 1743 - 1746
  • [25] Classifying evolving data streams with partially labeled data
    Borchani, Hanen
    Larranaga, Pedro
    Bielza, Concha
    INTELLIGENT DATA ANALYSIS, 2011, 15 (05) : 655 - 670
  • [26] Heterogeneous ensemble selection for evolving data streams
    Luong, Anh Vu
    Nguyen, Tien Thanh
    Liew, Alan Wee-Chung
    Wang, Shilin
    PATTERN RECOGNITION, 2021, 112
  • [27] A Survey: Approaches for Handling Evolving Data Streams
    Wankhade, Kapil
    Hasan, Tasneem
    Thool, Ravindra
    2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 621 - 625
  • [28] New Ensemble Methods For Evolving Data Streams
    Bifet, Albert
    Holmes, Geoff
    Pfahringer, Bernhard
    Kirkby, Richard
    Gavalda, Ricard
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 139 - 147
  • [29] Activity Recognition with Evolving Data Streams: A Review
    Abdallah, Zahraa S.
    Gaber, Mohamed Medhat
    Srinivasan, Bala
    Krishnaswamy, Shonali
    ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [30] Adaptive Learning from Evolving Data Streams
    Bifet, Albert
    Gavalda, Ricard
    ADVANCES IN INTELLIGENT DATA ANALYSIS VIII, PROCEEDINGS, 2009, 5772 : 249 - 260