Mining deviants in time series data streams

被引：0

作者：

Muthukrishnan, S ^{[1
]}

Shah, R ^{[1
]}

Vitter, JS ^{[1
]}

机构：

[1] Rutgers State Univ, Piscataway, NJ 08855 USA

来源：

16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS | 2004年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

One of the central tasks in managing, monitoring and mining data streams is that of identifying outliers. There is a long history of study of various outliers in statistics and databases, and a recent focus on mining outliers in data streams. Here, we adopt the notion of "deviants" from Jagadish et al [19] as outliers. Deviants are based on one of the most fundamental statistical concept of standard deviation (or variance). Formally, deviants are defined based on a representation sparsity metric, i.e., deviants are values whose removal from the dataset leads to an improved compressed representation of the remaining items. Thus, deviants are not global maxima/minima, but rather these are appropriate local aberrations. Deviants are known to be of great mining value in time series databases. We present first-known algorithms for identifying deviants on massive data streams. Our algorithms monitor streams using very small space (polylogarithmic in data size) and are able to quickly find deviants at any instant, as the data stream evolves over time. For all versions of this problem-uni- vs multivariate time series, optimal vs near-optimal vs heuristic solutions, offline vs streaming-our algorithms have the same framework of maintaining a hierarchical set of candidate deviants that are updated as the time series data gets progressively revealed. We show experimentally using real network traffic data (SNMP aggregate time series) as well as synthetic data that our algorithm is remarkably accurate in determining the deviants.

引用

页码：41 / 50

页数：10

共 50 条

[21] Recent advances in mining time series data
Keogh, E
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 6 - 6
[22] Temporal data mining for multivariate time series
Guimaraes, G
IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 1379 - 1385
[23] DATA MINING IN CANADIAN LYNX TIME SERIES
Karnaboopathy, R.
Venkatesan, D.
JOURNAL OF RELIABILITY AND STATISTICAL STUDIES, 2012, 5 (01): : 1 - 6
[24] Time Series Data Mining: A Retail Application
Hebert, Daniel
Anderson, Billie
Olinsky, Alan
Hardin, J. Michael
INTERNATIONAL JOURNAL OF BUSINESS ANALYTICS, 2014, 1 (04) : 51 - 68
[25] An Efficient Time Series Data Mining Technique
Aboalsamh, Hatim A.
Hafez, Alaaeldin M.
Assassa, Ghazy M. R.
PROCEEDINGS OF THE 12TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTERS , PTS 1-3: NEW ASPECTS OF COMPUTERS, 2008, : 950 - +
[26] Time Series Data Mining: A Unifying View
Keogh, Eamonn
PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3861 - 3863
[27] Outliers Mining in Time Series Data Sets
Zheng Binxiang
Journal of Systems Engineering and Electronics, 2002, (01) : 93 - 97
[28] Similarity problems in time series data mining
Yan, XB
Li, YJ
Fan, B
PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2003, : 382 - 385
[29] Data mining on time series of sequential patterns
Visa, A
DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY IV, 2002, 4730 : 166 - 171
[30] Preserving Privacy in Time Series Data Mining
Zhu, Ye
Fu, Yongjian
Fu, Huirong
INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2011, 7 (04) : 64 - 85

← 1 2 3 4 5 →