A bit level representation for time series data mining with shape based similarity

被引:47
|
作者
Bagnall, Anthony [1 ]
Ratanamahatana, Chotirat 'Ann'
Keogh, Eamonn
Lonardi, Stefano
Janacek, Gareth
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
[2] Chulalongkorn Univ, Dept Comp Engn, Bangkok 10330, Thailand
[3] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
基金
英国工程与自然科学研究理事会;
关键词
clipping; time series data mining; Kolmogorov complexity;
D O I
10.1007/s10618-005-0028-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clipping is the process of transforming a real valued series into a sequence of bits representing whether each data is above or below the average. In this paper, we argue that clipping is a useful and flexible transformation for the exploratory analysis of large time dependent data sets. We demonstrate how time series stored as bits can be very efficiently compressed and manipulated and that, under some assumptions, the discriminatory power with clipped series is asymptotically equivalent to that achieved with the raw data. Unlike other transformations, clipped series can be compared directly to the raw data series. We show that this means we can form a tight lower bounding metric for Euclidean and Dynamic Time Warping distance and hence efficiently query by content. Clipped data can be used in conjunction with a host of algorithms and statistical tests that naturally follow from the binary nature of the data. A series of experiments illustrate how clipped series can be used in increasingly complex ways to achieve better results than other popular representations. The usefulness of the proposed representation is demonstrated by the fact that the results with clipped data are consistently better than those achieved with a Wavelet or Discrete Fourier Transformation at the same compression ratio for both clustering and query by content. The flexibility of the representation is shown by the fact that we can take advantage of a variable Run Length Encoding of clipped series to define an approximation of the Kolmogorov complexity and hence perform Kolmogorov based clustering.
引用
收藏
页码:11 / 40
页数:30
相关论文
共 50 条
  • [1] A Bit Level Representation for Time Series Data Mining with Shape Based Similarity
    Anthony Bagnall
    Chotirat “Ann” Ratanamahatana
    Eamonn Keogh
    Stefano Lonardi
    Gareth Janacek
    Data Mining and Knowledge Discovery, 2006, 13 : 11 - 40
  • [2] An Enhanced Binary Symbolic Representation for Time Series Data Mining Based Similarity
    Sun, Meiyu
    Fang, Jianan
    2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 7130 - 7134
  • [3] Similarity search based on shape representation in time-series data sets
    Jiang, Rong
    Li, Deyi
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (05): : 601 - 608
  • [4] A novel bit level time series representation with implication of similarity search and clustering
    Ratanamahatana, C
    Keogh, E
    Bagnal, AJ
    Lonardi, S
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 771 - 777
  • [5] Similarity measure based on multidimensional shape feature representation for time series
    Li, H.-L., 1600, Systems Engineering Society of China (33):
  • [6] Similarity problems in time series data mining
    Yan, XB
    Li, YJ
    Fan, B
    PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2003, : 382 - 385
  • [7] A Trend Based Similarity Calculation Approach for Mining Time Series Data
    Yang, Yuhang
    Xia, Yingju
    Ge, Fujiang
    Meng, Yao
    Yu, Hao
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 461 - 464
  • [8] Research on similarity mining in time series data sets
    Zheng, Bin-Xiang
    Xi, Yu-Geng
    Du, Xiu-Hua
    Kongzhi yu Juece/Control and Decision, 2002, 17 (05): : 527 - 531
  • [9] Similarity Measure Based on Incremental Warping Window for Time Series Data Mining
    Li, Hailin
    Wang, Cheng
    IEEE ACCESS, 2019, 7 : 3909 - 3917
  • [10] A shape-based similarity measure for time series data with ensemble learning
    Nakamura, Tetsuya
    Taki, Keishi
    Nomiya, Hiroki
    Seki, Kazuhiro
    Uehara, Kuniaki
    PATTERN ANALYSIS AND APPLICATIONS, 2013, 16 (04) : 535 - 548