Optimal Distance Bounds for Fast Search on Compressed Time-Series Query Logs

被引:3
|
作者
Vlachos, Michail [1 ]
Kozat, Suleyman S. [2 ]
Yu, Philip S. [3 ]
机构
[1] IBM Zurich Res Lab, Ruschlikon, Switzerland
[2] Koc Univ, Dept Elect Engn, Istanbul, Turkey
[3] Univ Illinois, Dept Comp Sci, Chicago, IL USA
关键词
Algorithms;
D O I
10.1145/1734200.1734203
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Consider a database of time-series, where each datapoint in the series records the total number of users who asked for a specific query at an internet search engine. Storage and analysis of such logs can be very beneficial for a search company from multiple perspectives. First, from a data organization perspective, because query Weblogs capture important trends and statistics, they can help enhance and optimize the search experience (keyword recommendation, discovery of news events). Second, Weblog data can provide an important polling mechanism for the microeconomic aspects of a search engine, since they can facilitate and promote the advertising facet of the search engine (understand what users request and when they request it). Due to the sheer amount of time-series Weblogs, manipulation of the logs in a compressed form is an impeding necessity for fast data processing and compact storage requirements. Here, we explicate how to compute the lower and upper distance bounds on the time-series logs when working directly on their compressed form. Optimal distance estimation means tighter bounds, leading to better candidate selection/elimination and ultimately faster search performance. Our derivation of the optimal distance bounds is based on the careful analysis of the problem using optimization principles. The experimental evaluation suggests a clear performance advantage of the proposed method, compared to previous compression/search techniques. The presented method results in a 10-30% improvement on distance estimations, which in turn leads to 25-80% improvement on the search performance.
引用
收藏
页数:28
相关论文
共 50 条
  • [41] Fast fuzzy subsequence matching algorithms on time-series
    Gong, Xueyuan
    Fong, Simon
    Si, Yain-Whar
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 116 : 275 - 284
  • [42] Bayesian Compressed Vector Autoregression for Financial Time-Series Analysis and Forecasting
    Taveeapiradeecharoen, Paponpat
    Chamnongthai, Kosin
    Aunsri, Nattapol
    IEEE ACCESS, 2019, 7 : 16777 - 16786
  • [43] Location Time-series Clustering on Optimal Sensor Arrangement
    Yang, Zong-Hua
    Kao, Hung-Yu
    2012 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2012, : 113 - 118
  • [44] An Optimal Control Approach for the Registration of Image Time-Series
    Niethammer, Marc
    Hart, Gabriel L.
    Zach, Christopher
    PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 2427 - 2434
  • [45] A NOTE ON OPTIMAL AND ASYMPTOTICALLY OPTIMAL DESIGNS FOR CERTAIN TIME-SERIES MODELS
    EUBANK, RL
    SMITH, PL
    SMITH, PW
    ANNALS OF STATISTICS, 1982, 10 (04): : 1295 - 1301
  • [46] Measuring nonlinear dependence in time-series, a distance correlation approach
    Zhou, Zhou
    JOURNAL OF TIME SERIES ANALYSIS, 2012, 33 (03) : 438 - 457
  • [47] Hardware Accelerator for Shapelet Distance Computation in Time-Series Classification
    Costa, Victor Oliveira
    de Araujo Gewehr, Carlos Gabriel
    Vicenzi, Julio Costella
    Carara, Everton Alceu
    de Oliveira, Leonardo Londero
    33RD SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI 2020), 2020,
  • [48] FAST SUBSEQUENCE MATCHING UNDER TIME WARPING IN TIME-SERIES DATABASES
    Liu, Xiao-Ying
    Ren, Chuan-Lun
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1584 - 1590
  • [49] Designing a New Search Space for Multivariate Time-Series Neural Architecture Search
    MacKinnon, Christopher
    Atkinson, Robert
    ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA, AALTD 2023, 2023, 14343 : 190 - 204
  • [50] MOMENT BOUNDS FOR DERIVING TIME-SERIES CLTS AND MODEL SELECTION PROCEDURES
    FINDLEY, DF
    WEI, CZ
    STATISTICA SINICA, 1993, 3 (02) : 453 - 480