Optimal Distance Bounds for Fast Search on Compressed Time-Series Query Logs

被引:3
|
作者
Vlachos, Michail [1 ]
Kozat, Suleyman S. [2 ]
Yu, Philip S. [3 ]
机构
[1] IBM Zurich Res Lab, Ruschlikon, Switzerland
[2] Koc Univ, Dept Elect Engn, Istanbul, Turkey
[3] Univ Illinois, Dept Comp Sci, Chicago, IL USA
关键词
Algorithms;
D O I
10.1145/1734200.1734203
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Consider a database of time-series, where each datapoint in the series records the total number of users who asked for a specific query at an internet search engine. Storage and analysis of such logs can be very beneficial for a search company from multiple perspectives. First, from a data organization perspective, because query Weblogs capture important trends and statistics, they can help enhance and optimize the search experience (keyword recommendation, discovery of news events). Second, Weblog data can provide an important polling mechanism for the microeconomic aspects of a search engine, since they can facilitate and promote the advertising facet of the search engine (understand what users request and when they request it). Due to the sheer amount of time-series Weblogs, manipulation of the logs in a compressed form is an impeding necessity for fast data processing and compact storage requirements. Here, we explicate how to compute the lower and upper distance bounds on the time-series logs when working directly on their compressed form. Optimal distance estimation means tighter bounds, leading to better candidate selection/elimination and ultimately faster search performance. Our derivation of the optimal distance bounds is based on the careful analysis of the problem using optimization principles. The experimental evaluation suggests a clear performance advantage of the proposed method, compared to previous compression/search techniques. The presented method results in a 10-30% improvement on distance estimations, which in turn leads to 25-80% improvement on the search performance.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Semantic Query Answering with Time-Series Graphs
    Ferres, Leo
    Dumontier, Michel
    Villanueva-Rosales, Natalia
    2007 11TH IEEE INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE WORKSHOPS, 2007, : 117 - +
  • [2] Anomaly Detection on Time-series Logs for Industrial Network
    Chen, Lin
    Kuang, Xiaoyun
    Xu, Aidong
    Suo, Siliang
    Yang, Yiwei
    2020 3RD INTERNATIONAL CONFERENCE ON SMART BLOCKCHAIN (SMARTBLOCK), 2020, : 81 - 86
  • [3] Processing Encrypted and Compressed Time-Series Data
    Harvan, Matus
    Kimoto, Samuel
    Locher, Thomas
    Pignolet, Yvonne Anne
    Schneider, Johannes
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 1053 - 1062
  • [4] Time dependent approach for query and url recommendations using search engine query logs
    Umagandhi, R. (umakongunadu@gmail.com), 1600, International Association of Engineers (40):
  • [5] Nonparametric Risk Bounds for Time-Series Forecasting
    McDonald, Daniel J.
    Shalizi, Cosma Rohila
    Schervish, Mark
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18 : 1 - 40
  • [6] APPLICATIONS OF FAST ORTHOGONAL SEARCH - TIME-SERIES ANALYSIS AND RESOLUTION OF SIGNALS IN NOISE
    KORENBERG, MJ
    PAARMANN, LD
    ANNALS OF BIOMEDICAL ENGINEERING, 1989, 17 (03) : 219 - 231
  • [7] SEARCH FOR RHYTHMICITY IN BIOLOGICAL TIME-SERIES
    ENRIGHT, JT
    JOURNAL OF THEORETICAL BIOLOGY, 1965, 8 (03) : 426 - &
  • [8] Fast and Accurate Time-Series Clustering
    Paparrizos, John
    Gravano, Luis
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2017, 42 (02):
  • [9] Generalized time-series active search with Kullback-Leibler distance for audio fingerprinting
    Lin, Hui
    Ou, Zhijian
    Xiao, Xi
    IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (08) : 465 - 468
  • [10] OPTIMAL TIME-SERIES SELECTION OF QUASARS
    Butler, Nathaniel R.
    Bloom, Joshua S.
    ASTRONOMICAL JOURNAL, 2011, 141 (03):