A bit level representation for time series data mining with shape based similarity

被引:47
|
作者
Bagnall, Anthony [1 ]
Ratanamahatana, Chotirat 'Ann'
Keogh, Eamonn
Lonardi, Stefano
Janacek, Gareth
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
[2] Chulalongkorn Univ, Dept Comp Engn, Bangkok 10330, Thailand
[3] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
基金
英国工程与自然科学研究理事会;
关键词
clipping; time series data mining; Kolmogorov complexity;
D O I
10.1007/s10618-005-0028-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clipping is the process of transforming a real valued series into a sequence of bits representing whether each data is above or below the average. In this paper, we argue that clipping is a useful and flexible transformation for the exploratory analysis of large time dependent data sets. We demonstrate how time series stored as bits can be very efficiently compressed and manipulated and that, under some assumptions, the discriminatory power with clipped series is asymptotically equivalent to that achieved with the raw data. Unlike other transformations, clipped series can be compared directly to the raw data series. We show that this means we can form a tight lower bounding metric for Euclidean and Dynamic Time Warping distance and hence efficiently query by content. Clipped data can be used in conjunction with a host of algorithms and statistical tests that naturally follow from the binary nature of the data. A series of experiments illustrate how clipped series can be used in increasingly complex ways to achieve better results than other popular representations. The usefulness of the proposed representation is demonstrated by the fact that the results with clipped data are consistently better than those achieved with a Wavelet or Discrete Fourier Transformation at the same compression ratio for both clustering and query by content. The flexibility of the representation is shown by the fact that we can take advantage of a variable Run Length Encoding of clipped series to define an approximation of the Kolmogorov complexity and hence perform Kolmogorov based clustering.
引用
收藏
页码:11 / 40
页数:30
相关论文
共 50 条
  • [31] Data Mining of Time Series Based on Wave Cluster
    Dong Jixue
    2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 697 - 699
  • [32] Towards perception based time series data mining
    Batyrshin, Ildar Z.
    Sheremetov, Leonid
    FORGING NEW FRONTIERS: FUZZY PIONEERS I, 2007, 217 : 217 - 230
  • [33] An algorithm for time series data mining based on clustering
    Wu, Shaozhi
    Wu, Yue
    Wang, Ying
    Ye, Yalan
    2006 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1-4: VOL 1: SIGNAL PROCESSING, 2006, : 2155 - +
  • [34] Three-dimensional piecewise cloud representation for time series data mining
    Si, Gangquan
    Zheng, Kai
    Zhou, Zhou
    Pan, Chengjie
    Xu, Xiang
    Qu, Kai
    Zhang, Yanbin
    NEUROCOMPUTING, 2018, 316 : 78 - 94
  • [35] A time series representation for Temporal Web Mining
    Samia, Mireille
    Conrad, Stefan
    2006 SEVENTH INTERNATIONAL BALTIC CONFERENCE ON DATABASES AND INFORMATION SYSTEMS - PROCEEDINGS, 2006, : 132 - +
  • [36] Similarity measures for time series data classification using grid representation and matrix distance
    Yanqing Ye
    Jiang Jiang
    Bingfeng Ge
    Yajie Dou
    Kewei Yang
    Knowledge and Information Systems, 2019, 60 : 1105 - 1134
  • [37] Comparison of similarity measures and clustering methods for time-series medical data mining
    Hirano, S
    Tsumoto, S
    DATA MINING AND KNOWLEDGE DISCOVERY: TOOLS AND TECHNOLOGY V, 2003, 5098 : 219 - 225
  • [38] Finding Structural Similarity in Time Series Data Using Bag-of-Patterns Representation
    Lin, Jessica
    Li, Yuan
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2009, 5566 : 461 - 477
  • [39] Similarity measures for time series data classification using grid representation and matrix distance
    Ye, Yanqing
    Jiang, Jiang
    Ge, Bingfeng
    Dou, Yajie
    Yang, Kewei
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (02) : 1105 - 1134
  • [40] Time series data mining: similarity search and its application to the stock indices in the region
    Lovric, Miodrag
    Milanovic, Marina
    Stamenkovic, Milan
    TECHNICS TECHNOLOGIES EDUCATION MANAGEMENT-TTEM, 2012, 7 (04): : 1605 - 1614