Mining deviants in time series data streams

被引:0
|
作者
Muthukrishnan, S [1 ]
Shah, R [1 ]
Vitter, JS [1 ]
机构
[1] Rutgers State Univ, Piscataway, NJ 08855 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the central tasks in managing, monitoring and mining data streams is that of identifying outliers. There is a long history of study of various outliers in statistics and databases, and a recent focus on mining outliers in data streams. Here, we adopt the notion of "deviants" from Jagadish et al [19] as outliers. Deviants are based on one of the most fundamental statistical concept of standard deviation (or variance). Formally, deviants are defined based on a representation sparsity metric, i.e., deviants are values whose removal from the dataset leads to an improved compressed representation of the remaining items. Thus, deviants are not global maxima/minima, but rather these are appropriate local aberrations. Deviants are known to be of great mining value in time series databases. We present first-known algorithms for identifying deviants on massive data streams. Our algorithms monitor streams using very small space (polylogarithmic in data size) and are able to quickly find deviants at any instant, as the data stream evolves over time. For all versions of this problem-uni- vs multivariate time series, optimal vs near-optimal vs heuristic solutions, offline vs streaming-our algorithms have the same framework of maintaining a hierarchical set of candidate deviants that are updated as the time series data gets progressively revealed. We show experimentally using real network traffic data (SNMP aggregate time series) as well as synthetic data that our algorithm is remarkably accurate in determining the deviants.
引用
收藏
页码:41 / 50
页数:10
相关论文
共 50 条
  • [1] Mining deviants in a time series database
    Jagadish, HV
    Koudas, N
    Muthukrishnan, S
    PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 102 - 113
  • [2] On mining time-changing data streams
    Huang, SC
    Dong, YS
    CHINESE JOURNAL OF ELECTRONICS, 2006, 15 (02): : 220 - 224
  • [3] On mining time-changing data streams
    Department of Computer Science and Engineering, Southeast University, Nanjing 210018, China
    不详
    Chin J Electron, 2006, 2 (220-224):
  • [4] Data mining in medical time series
    Mikut, Ralf
    Reischl, Markus
    Burmeister, Ole
    Loose, Tobias
    BIOMEDIZINISCHE TECHNIK, 2006, 51 (5-6): : 288 - 293
  • [5] A review on time series data mining
    Fu, Tak-chung
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (01) : 164 - 181
  • [6] Process Mining for Time Series Data
    Ziolkowski, Tobias
    Koschmider, Agnes
    Schubert, Rene
    Renz, Matthias
    ENTERPRISE, BUSINESS-PROCESS AND INFORMATION SYSTEMS MODELING, 2022, 450 : 347 - 350
  • [7] Time series financial data mining
    Tseng, CC
    Kang, CT
    PROCEEDINGS OF THE 8TH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1-3, 2005, : 1035 - 1038
  • [8] A Survey on Time Series Data Mining
    Fakhrazari, Amin
    Vakilzadian, Hamid
    2017 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2017, : 476 - 481
  • [9] On privacy in time series data mining
    Zhu, Ye
    Fu, Yongjian
    Fu, Huirong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 479 - +
  • [10] Time-Series Data Mining
    Esling, Philippe
    Agon, Carlos
    ACM COMPUTING SURVEYS, 2012, 45 (01)