Mining deviants in time series data streams

被引:0
|
作者
Muthukrishnan, S [1 ]
Shah, R [1 ]
Vitter, JS [1 ]
机构
[1] Rutgers State Univ, Piscataway, NJ 08855 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the central tasks in managing, monitoring and mining data streams is that of identifying outliers. There is a long history of study of various outliers in statistics and databases, and a recent focus on mining outliers in data streams. Here, we adopt the notion of "deviants" from Jagadish et al [19] as outliers. Deviants are based on one of the most fundamental statistical concept of standard deviation (or variance). Formally, deviants are defined based on a representation sparsity metric, i.e., deviants are values whose removal from the dataset leads to an improved compressed representation of the remaining items. Thus, deviants are not global maxima/minima, but rather these are appropriate local aberrations. Deviants are known to be of great mining value in time series databases. We present first-known algorithms for identifying deviants on massive data streams. Our algorithms monitor streams using very small space (polylogarithmic in data size) and are able to quickly find deviants at any instant, as the data stream evolves over time. For all versions of this problem-uni- vs multivariate time series, optimal vs near-optimal vs heuristic solutions, offline vs streaming-our algorithms have the same framework of maintaining a hierarchical set of candidate deviants that are updated as the time series data gets progressively revealed. We show experimentally using real network traffic data (SNMP aggregate time series) as well as synthetic data that our algorithm is remarkably accurate in determining the deviants.
引用
收藏
页码:41 / 50
页数:10
相关论文
共 50 条
  • [31] A data mining framework for time series estimation
    Hu, Xiao
    Xu, Peng
    Wu, Shaozhi
    Asgari, Shadnaz
    Bergsneider, Marvin
    JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (02) : 190 - 199
  • [32] Time Series Data Mining: A Unifying View
    Keogh, Eamonn
    2024 IEEE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, DSAA 2024, 2024, : 424 - 426
  • [33] Research on framework of time series data mining
    Yan, XB
    Li, YJ
    Jin, SW
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS 1 AND 2, 2004, : 197 - 200
  • [34] Visual mining of spatial time series data
    Andrienko, G
    Andrienko, N
    Gatalsky, P
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004, PROCEEDINGS, 2004, 3202 : 524 - 527
  • [35] Introducing time series chains: a new primitive for time series data mining
    Zhu, Yan
    Imamura, Makoto
    Nikovski, Daniel
    Keogh, Eamonn
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (02) : 1135 - 1161
  • [36] Introducing time series chains: a new primitive for time series data mining
    Yan Zhu
    Makoto Imamura
    Daniel Nikovski
    Eamonn Keogh
    Knowledge and Information Systems, 2019, 60 : 1135 - 1161
  • [37] Applications of Data Mining to Time Series of Electrical Disturbance Data
    Cornforth, David
    2009 IEEE POWER & ENERGY SOCIETY GENERAL MEETING, VOLS 1-8, 2009, : 2179 - 2186
  • [38] Active mining of data streams
    Fan, W
    Huang, YA
    Wang, HX
    Yu, PS
    PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 457 - 461
  • [39] Mining databases and data streams
    Zaniolo, Carlo
    Thakkar, Hetal
    HOMELAND SECURITY TECHNOLOGY CHALLENGES: FROM SENSING AND ENCRYPTING TO MINING AND MODELING, 2008, : 103 - +
  • [40] Mining data streams: A review
    Gaber, MM
    Zaslavsky, A
    Krishnaswamy, S
    SIGMOD RECORD, 2005, 34 (02) : 18 - 26