Hierarchical clustering of time-series data streams

被引:129
|
作者
Rodrigues, Pedro Pereira [1 ,2 ]
Gama, Joao [1 ,3 ]
Pedroso, Joao Pedro [4 ,5 ]
机构
[1] LIAAD INESC Porto LA, P-4050190 Oporto, Portugal
[2] Univ Porto, Fac Sci, P-4050190 Oporto, Portugal
[3] Univ Porto, Fac Econ, P-4050190 Oporto, Portugal
[4] UESP INESC, P-4169007 Oporto, Portugal
[5] Univ Porto, Fac Sci, P-4169007 Oporto, Portugal
关键词
data stream analysis; clustering streaming time series; incremental hierarchical clustering; change detection;
D O I
10.1109/TKDE.2007.190727
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data, using a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams. The system also uses a merge operator that reaggregates a previously split node in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters, assuming that in stationary environments, expanding the structure leads to a decrease in the diameters of the clusters. The system is designed to process thousands of data streams that flow at a high rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting a competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift.
引用
收藏
页码:615 / 627
页数:13
相关论文
共 50 条
  • [41] A time-series clustering methodology for knowledge extraction in energy consumption data
    Ruiz, L. G. B.
    Pegalajar, M. C.
    Arcucci, R.
    Molina-Solana, M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 160
  • [42] A time-series clustering methodology for knowledge extraction in energy consumption data
    Ruiz, L.G.B.
    Pegalajar, M.C.
    Arcucci, R.
    Molina-Solana, M.
    Expert Systems with Applications, 2020, 160
  • [43] Deterministic Time-Series Joins for Asynchronous High-Throughput Data Streams
    Schranz, Christoph
    Jeremias, Peter Michael
    2020 25TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2020, : 1031 - 1034
  • [44] Hierarchical clustering of time series data with parametric derivative dynamic time warping
    Luczak, Maciej
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 62 : 116 - 130
  • [45] Evolutionary hierarchical time series clustering
    Chis, Monica
    Grosan, Crina
    ISDA 2006: SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 1, 2006, : 451 - 455
  • [46] Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm
    Darkins, Robert
    Cooke, Emma J.
    Ghahramani, Zoubin
    Kirk, Paul D. W.
    Wild, David L.
    Savage, Richard S.
    PLOS ONE, 2013, 8 (04):
  • [47] Whole Time Series Data Streams Clustering: Dynamic Profiling of the Electricity Consumption
    Gajowniczek, Krzysztof
    Bator, Marcin
    Zabkowski, Tomasz
    ENTROPY, 2020, 22 (12) : 1 - 35
  • [48] Hierarchical-fuzzy clustering of temporal-patterns and its application for time-series prediction
    Geva, AB
    PATTERN RECOGNITION LETTERS, 1999, 20 (14) : 1519 - 1532
  • [49] Comparison of similarity measures and clustering methods for time-series medical data mining
    Hirano, S
    Tsumoto, S
    DATA MINING AND KNOWLEDGE DISCOVERY: TOOLS AND TECHNOLOGY V, 2003, 5098 : 219 - 225
  • [50] Time-Series Data Mining
    Esling, Philippe
    Agon, Carlos
    ACM COMPUTING SURVEYS, 2012, 45 (01)