Mining deviants in time series data streams

被引:0
|
作者
Muthukrishnan, S [1 ]
Shah, R [1 ]
Vitter, JS [1 ]
机构
[1] Rutgers State Univ, Piscataway, NJ 08855 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the central tasks in managing, monitoring and mining data streams is that of identifying outliers. There is a long history of study of various outliers in statistics and databases, and a recent focus on mining outliers in data streams. Here, we adopt the notion of "deviants" from Jagadish et al [19] as outliers. Deviants are based on one of the most fundamental statistical concept of standard deviation (or variance). Formally, deviants are defined based on a representation sparsity metric, i.e., deviants are values whose removal from the dataset leads to an improved compressed representation of the remaining items. Thus, deviants are not global maxima/minima, but rather these are appropriate local aberrations. Deviants are known to be of great mining value in time series databases. We present first-known algorithms for identifying deviants on massive data streams. Our algorithms monitor streams using very small space (polylogarithmic in data size) and are able to quickly find deviants at any instant, as the data stream evolves over time. For all versions of this problem-uni- vs multivariate time series, optimal vs near-optimal vs heuristic solutions, offline vs streaming-our algorithms have the same framework of maintaining a hierarchical set of candidate deviants that are updated as the time series data gets progressively revealed. We show experimentally using real network traffic data (SNMP aggregate time series) as well as synthetic data that our algorithm is remarkably accurate in determining the deviants.
引用
收藏
页码:41 / 50
页数:10
相关论文
共 50 条
  • [41] TS-stream: clustering time series on data streams
    Pereira, Cassio M. M.
    de Mello, Rodrigo F.
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2014, 42 (03) : 531 - 566
  • [42] TS-stream: clustering time series on data streams
    Cássio M. M. Pereira
    Rodrigo F. de Mello
    Journal of Intelligent Information Systems, 2014, 42 : 531 - 566
  • [43] Simulation Study on the Electricity Data Streams Time Series Clustering
    Gajowniczek, Krzysztof
    Bator, Marcin
    Zabkowski, Tomasz
    Orlowski, Arkadiusz
    Loo, Chu Kiong
    ENERGIES, 2020, 13 (04)
  • [44] Adaptive forecasting method for time-series data streams
    School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
    不详
    不详
    不详
    Zidonghua Xuebao, 2007, 2 (197-201):
  • [45] An active learning system for mining time-changing data streams
    Huang, Shucheng
    Dong, Yisheng
    INTELLIGENT DATA ANALYSIS, 2007, 11 (04) : 401 - 419
  • [46] The data mining technique of time-series trending structure series
    Gao, Xiangjun
    Du, Qiliang
    Tian, Lianfang
    Mao, Zongyuan
    Wang, Yong
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 6019 - +
  • [47] Mining Frequent Patterns in the Recent Time Window over Data Streams
    Chen, Hui
    HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 586 - 593
  • [48] An Active Learning Method for Mining Time-Changing Data Streams
    Huang, Shucheng
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL II, PROCEEDINGS, 2008, : 548 - 552
  • [49] Data mining process for modeling hydrological time series
    Keskin, M. Erol
    Taylan, Dilek
    Kucuksille, Ecir Ugur
    HYDROLOGY RESEARCH, 2013, 44 (01): : 78 - 88
  • [50] Time Series Qlet : Invariant Approach for Data Mining
    Anand, Abhishek
    Padmanabhan, Vineet
    2013 SIXTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2013, : 24 - 29