DDR: an index method for large time-series datasets

被引:14
|
作者
An, JY
Chen, YPP
Chen, HX
机构
[1] Deakin Univ, Sch Informat Technol, Fac Sci & Technol, Melbourne, Vic 3125, Australia
[2] Australia Res Council Ctr Bioinformat, Melbourne, Vic, Australia
[3] Univ Tsukuba, Inst Informat Sci & Elect, Tsukuba, Ibaraki 305, Japan
关键词
time series; indexing; dimensionality reduction;
D O I
10.1016/j.is.2004.05.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tree index structure is a traditional method for searching similar data in large datasets. It is based on the presupposition that most sub-trees are pruned in the searching process. As a result, the number of page accesses is reduced. However, time-series datasets generally have a very high dimensionality. Because of the so-called dimensionality curse, the pruning effectiveness is reduced in high dimensionality. Consequently, the tree index structure is not a suitable method for time-series datasets. In this paper, we propose a two-phase (filtering and refinement) method for searching time-series datasets. In the filtering step, a quantizing time-series is used to construct a compact file which is scanned for filtering out irrelevant. A small set of candidates is translated to the second step for refinement. In this step, we introduce an effective index compression method named grid-based datawise dimensionality reduction (DRR) which attempts to preserve the characteristics of the time-series. An experimental comparison with existing techniques demonstrates the utility of our approach. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:333 / 348
页数:16
相关论文
共 50 条
  • [1] Identifying Label Noise in Time-Series Datasets
    Atkinson, Gentry
    Metsis, Vangelis
    UBICOMP/ISWC '20 ADJUNCT: PROCEEDINGS OF THE 2020 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2020 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2020, : 238 - 243
  • [2] A Process-Oriented Method for Tracking Rainstorms with a Time-Series of Raster Datasets
    Xue, Cunjin
    Liu, Jingyi
    Yang, Guanghui
    Wu, Chengbin
    APPLIED SCIENCES-BASEL, 2019, 9 (12):
  • [3] Clustering of large time series datasets
    Aghabozorgi, Saeed
    Teh, Ying Wah
    INTELLIGENT DATA ANALYSIS, 2014, 18 (05) : 793 - 817
  • [4] Cluster analysis of long time-series medical datasets
    Hirano, S
    Tsumoto, S
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY VI, 2004, 5433 : 13 - 20
  • [5] Forecasting Time-Series Trends by Merging Structured and Unstructured Datasets
    Park, Ji Sang
    Cho, Hyeon Sung
    Lee, Ji Sung
    Chung, Kyo-Il
    Kim, Jeong Min
    Kim, Dong Jin
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1230 - 1233
  • [7] Mining single-cell time-series datasets with Time Course Inspector
    Dobrzynski, Maciej
    Jacques, Marc-Antoine
    Pertz, Olivier
    BIOINFORMATICS, 2020, 36 (06) : 1968 - 1969
  • [8] TIME-SERIES SEGMENTATION - A MODEL AND A METHOD
    SCLOVE, SL
    INFORMATION SCIENCES, 1983, 29 (01) : 7 - 25
  • [9] ON THE METHOD OF COEFFICIENTS IN CREATION OF TIME-SERIES
    MATHE, S
    EKONOMICKY CASOPIS, 1989, 37 (11): : 1021 - 1036
  • [10] A METHOD OF EDITING TIME-SERIES OBSERVATIONS
    HALPENNY, J
    GEOPHYSICS, 1984, 49 (05) : 521 - 524