Optimal Distance Bounds for Fast Search on Compressed Time-Series Query Logs

被引:3
|
作者
Vlachos, Michail [1 ]
Kozat, Suleyman S. [2 ]
Yu, Philip S. [3 ]
机构
[1] IBM Zurich Res Lab, Ruschlikon, Switzerland
[2] Koc Univ, Dept Elect Engn, Istanbul, Turkey
[3] Univ Illinois, Dept Comp Sci, Chicago, IL USA
关键词
Algorithms;
D O I
10.1145/1734200.1734203
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Consider a database of time-series, where each datapoint in the series records the total number of users who asked for a specific query at an internet search engine. Storage and analysis of such logs can be very beneficial for a search company from multiple perspectives. First, from a data organization perspective, because query Weblogs capture important trends and statistics, they can help enhance and optimize the search experience (keyword recommendation, discovery of news events). Second, Weblog data can provide an important polling mechanism for the microeconomic aspects of a search engine, since they can facilitate and promote the advertising facet of the search engine (understand what users request and when they request it). Due to the sheer amount of time-series Weblogs, manipulation of the logs in a compressed form is an impeding necessity for fast data processing and compact storage requirements. Here, we explicate how to compute the lower and upper distance bounds on the time-series logs when working directly on their compressed form. Optimal distance estimation means tighter bounds, leading to better candidate selection/elimination and ultimately faster search performance. Our derivation of the optimal distance bounds is based on the careful analysis of the problem using optimization principles. The experimental evaluation suggests a clear performance advantage of the proposed method, compared to previous compression/search techniques. The presented method results in a 10-30% improvement on distance estimations, which in turn leads to 25-80% improvement on the search performance.
引用
收藏
页数:28
相关论文
共 50 条
  • [31] Parameter-Free Search of Time-Series Discord
    Luo, Wei
    Gallagher, Marcus
    Wiles, Janet
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2013, 28 (02) : 300 - 310
  • [32] Time heuristics ranking approach for recommended queries using search engine query logs
    Umagandhi, R.
    Kumar, A. V. Senthil
    KUWAIT JOURNAL OF SCIENCE, 2014, 41 (02) : 127 - 149
  • [33] TIME-SERIES ANALYSIS OF VISUAL-SEARCH STRATEGIES
    BROGAN, D
    PERCEPTION, 1984, 13 (01) : A28 - A28
  • [34] THE SEARCH FOR TIME-SERIES PREDICTABILITY-BASED ANOMALIES
    Humberto Ospina-Holguin, Javier
    Milena Padilla-Ospina, Ana
    JOURNAL OF BUSINESS ECONOMICS AND MANAGEMENT, 2022, 23 (01) : 1 - 19
  • [35] TIME-SERIES SEARCH FOR TREND IN TOTAL OZONE MEASUREMENTS
    STJOHN, DS
    BAILEY, SP
    FELLNER, WH
    MINOR, JM
    SNEE, RD
    JOURNAL OF GEOPHYSICAL RESEARCH-OCEANS, 1981, 86 (NC8) : 7299 - 7311
  • [36] Parameter-Free Search of Time-Series Discord
    Wei Luo
    Marcus Gallagher
    Janet Wiles
    JournalofComputerScience&Technology, 2013, 28 (02) : 300 - 310
  • [37] Parameter-Free Search of Time-Series Discord
    Wei Luo
    Marcus Gallagher
    Janet Wiles
    Journal of Computer Science and Technology, 2013, 28 : 300 - 310
  • [38] THE SEARCH FOR HIDDEN PERIODICITIES IN BIOLOGICAL TIME-SERIES REVISITED
    DOWSE, HB
    RINGO, JM
    JOURNAL OF THEORETICAL BIOLOGY, 1989, 139 (04) : 487 - 515
  • [39] System of Recognizing Human Action by Mining in Time-Series Motion Logs and Applications
    Ho, Yihsin
    Nakamura, Kota
    Shibano, Tomomi
    Sato-Shimokawara, Eri
    Yamaguchi, Toru
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 3447 - 3452
  • [40] Time Series Similarity Search Based on Positive and Negative Query
    Wang, Jimin
    Liu, Qi
    Zhang, Pengcheng
    BIG DATA - BIGDATA 2018, 2018, 10968 : 3 - 16