Optimal Distance Bounds for Fast Search on Compressed Time-Series Query Logs

被引:3
|
作者
Vlachos, Michail [1 ]
Kozat, Suleyman S. [2 ]
Yu, Philip S. [3 ]
机构
[1] IBM Zurich Res Lab, Ruschlikon, Switzerland
[2] Koc Univ, Dept Elect Engn, Istanbul, Turkey
[3] Univ Illinois, Dept Comp Sci, Chicago, IL USA
关键词
Algorithms;
D O I
10.1145/1734200.1734203
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Consider a database of time-series, where each datapoint in the series records the total number of users who asked for a specific query at an internet search engine. Storage and analysis of such logs can be very beneficial for a search company from multiple perspectives. First, from a data organization perspective, because query Weblogs capture important trends and statistics, they can help enhance and optimize the search experience (keyword recommendation, discovery of news events). Second, Weblog data can provide an important polling mechanism for the microeconomic aspects of a search engine, since they can facilitate and promote the advertising facet of the search engine (understand what users request and when they request it). Due to the sheer amount of time-series Weblogs, manipulation of the logs in a compressed form is an impeding necessity for fast data processing and compact storage requirements. Here, we explicate how to compute the lower and upper distance bounds on the time-series logs when working directly on their compressed form. Optimal distance estimation means tighter bounds, leading to better candidate selection/elimination and ultimately faster search performance. Our derivation of the optimal distance bounds is based on the careful analysis of the problem using optimization principles. The experimental evaluation suggests a clear performance advantage of the proposed method, compared to previous compression/search techniques. The presented method results in a 10-30% improvement on distance estimations, which in turn leads to 25-80% improvement on the search performance.
引用
收藏
页数:28
相关论文
共 50 条
  • [21] A Query Language and Its Processing for Time-Series Document Clusters
    Khy, Sophoin
    Ishikawa, Yoshiharu
    Kitagawa, Hiroyuki
    DIGITAL LIBRARIES: UNIVERSAL AND UBIQUITOUS ACCESS TO INFORMATION, PROCEEDINGS, 2008, 5362 : 82 - +
  • [22] DIAMOND FAST - INTERACTIVE GRAPHICS FOR TIME-SERIES
    UNWIN, A
    AMERICAN STATISTICIAN, 1991, 45 (04): : 340 - 340
  • [23] Fast Nonparametric Clustering of Structured Time-Series
    Hensman, James
    Rattray, Magnus
    Lawrence, Neil D.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) : 383 - 393
  • [24] Distance Sensitivity Oracles with Subcubic Preprocessing Time and Fast Query Time
    Chechik, Shiri
    Cohen, Sarel
    PROCEEDINGS OF THE 52ND ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '20), 2020, : 1375 - 1388
  • [25] AN OPTIMAL METRIC FOR PREDICTING CHAOTIC TIME-SERIES
    TANAKA, N
    OKAMOTO, H
    NAITO, M
    JAPANESE JOURNAL OF APPLIED PHYSICS PART 1-REGULAR PAPERS SHORT NOTES & REVIEW PAPERS, 1995, 34 (01): : 388 - 394
  • [26] Distance measures for effective clustering of ARIMA time-series
    Kalpakis, K
    Gada, D
    Puttagunta, V
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 273 - 280
  • [27] Dedicated Shapelet Distance Engine for Time-Series Classification
    Costa, Victor Oliveira
    de Araujo Gewehr, Carlos Gabriel
    Vicenzi, Julio Costella
    Carara, Everton Alceu
    de Oliveira, Leonardo Londero
    IEEE DESIGN & TEST, 2022, 39 (02) : 7 - 14
  • [28] CANONICAL CORRELATIONS OF PAST AND FUTURE FOR TIME-SERIES - BOUNDS AND COMPUTATION
    JEWELL, NP
    BLOOMFIELD, P
    BARTMANN, FC
    ANNALS OF STATISTICS, 1983, 11 (03): : 848 - 855
  • [29] Systematic Derivation of Bounds and Glue Constraints for Time-Series Constraints
    Arafailova, Ekaterina
    Beldiceanu, Nicolas
    Carlsson, Mats
    Flener, Pierre
    Rodriguez, Maria Andreina Francisco
    Pearson, Justin
    Simonis, Helmut
    PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2016, 2016, 9892 : 13 - 29
  • [30] Permutation Jensen-Shannon distance: A versatile and fast symbolic tool for complex time-series analysis
    Zunino, Luciano
    Olivares, Felipe
    Ribeiro, Haroldo V.
    Rosso, Osvaldo A.
    PHYSICAL REVIEW E, 2022, 105 (04)