A bit level representation for time series data mining with shape based similarity

被引:47
|
作者
Bagnall, Anthony [1 ]
Ratanamahatana, Chotirat 'Ann'
Keogh, Eamonn
Lonardi, Stefano
Janacek, Gareth
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
[2] Chulalongkorn Univ, Dept Comp Engn, Bangkok 10330, Thailand
[3] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
基金
英国工程与自然科学研究理事会;
关键词
clipping; time series data mining; Kolmogorov complexity;
D O I
10.1007/s10618-005-0028-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clipping is the process of transforming a real valued series into a sequence of bits representing whether each data is above or below the average. In this paper, we argue that clipping is a useful and flexible transformation for the exploratory analysis of large time dependent data sets. We demonstrate how time series stored as bits can be very efficiently compressed and manipulated and that, under some assumptions, the discriminatory power with clipped series is asymptotically equivalent to that achieved with the raw data. Unlike other transformations, clipped series can be compared directly to the raw data series. We show that this means we can form a tight lower bounding metric for Euclidean and Dynamic Time Warping distance and hence efficiently query by content. Clipped data can be used in conjunction with a host of algorithms and statistical tests that naturally follow from the binary nature of the data. A series of experiments illustrate how clipped series can be used in increasingly complex ways to achieve better results than other popular representations. The usefulness of the proposed representation is demonstrated by the fact that the results with clipped data are consistently better than those achieved with a Wavelet or Discrete Fourier Transformation at the same compression ratio for both clustering and query by content. The flexibility of the representation is shown by the fact that we can take advantage of a variable Run Length Encoding of clipped series to define an approximation of the Kolmogorov complexity and hence perform Kolmogorov based clustering.
引用
收藏
页码:11 / 40
页数:30
相关论文
共 50 条
  • [41] On Convolutional Autoencoders to Speed Up Similarity-Based Time Series Mining
    Aragao da Silva, Yuri Gabriel
    Silva, Diego Furtado
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4769 - 4778
  • [42] Feature Representation and Similarity Measure Based on Covariance Sequence for Multivariate Time Series
    Li, Hailin
    Lin, Chunpei
    Wan, Xiaoji
    Li, Zhengxin
    IEEE ACCESS, 2019, 7 : 67018 - 67026
  • [43] Similarity Preserving Representation Learning for Time Series Clustering
    Lei, Qi
    Yi, Jinfeng
    Vaculin, Roman
    Wu, Lingfei
    Dhillon, Inderjit S.
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2845 - 2851
  • [44] A time series representation of protein sequences for similarity comparison
    Li, Cancan
    Dai, Qi
    He, Ping-an
    JOURNAL OF THEORETICAL BIOLOGY, 2022, 538
  • [45] SHAPE-BASED TIME SERIES SIMILARITY MEASURE AND PATTERN DISCOVERY ALGORITHM
    Zeng Fanzi Qiu Zhengding Li Dongsheng Yue Jianhai(Institute of Information and Science
    Journal of Electronics(China), 2005, (02) : 142 - 148
  • [46] SHAPE-BASED TIME SERIES SIMILARITY MEASURE AND PATTERN DISCOVERY ALGORITHM
    Zeng Fanzi Qiu Zhengding Li Dongsheng Yue JianhaiInstitute of Information and Science Beijing Jiaotong University Beijing ChinaDongjian Hydropower Plant Hunan China
    JournalofElectronics, 2005, (02) : 142 - 148
  • [47] Shape similarity index for time series based on features of Euclidean distances histograms
    Bautista-Thompson, E.
    De la Cruz, S. Santos
    CIC 2006: 15TH INTERNATIONAL CONFERENCE ON COMPUTING, PROCEEDINGS, 2006, : 60 - 64
  • [48] Data Mining by Means of Binary Representation: A Model for Similarity and Clustering
    Zippy Erlich
    Roy Gelbard
    Israel Spiegler
    Information Systems Frontiers, 2002, 4 : 187 - 197
  • [49] Data mining by means of binary representation: A model for similarity and clustering
    Erlich, Z
    Gelbard, R
    Spiegler, I
    INFORMATION SYSTEMS FRONTIERS, 2002, 4 (02) : 187 - 197
  • [50] A Shape Based Similarity Measure for Time Series Classification with Weighted Dynamic Time Warping Algorithm
    Ye, Yanqing
    Niu, Caiyun
    Jiang, Jiang
    Ge, Bingfeng
    Yang, Kewei
    2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 104 - 109