Optimal Distance Bounds for Fast Search on Compressed Time-Series Query Logs

被引：3

作者：

Vlachos, Michail ^{[1
]}

Kozat, Suleyman S. ^{[2
]}

Yu, Philip S. ^{[3
]}

机构：

[1] IBM Zurich Res Lab, Ruschlikon, Switzerland

[2] Koc Univ, Dept Elect Engn, Istanbul, Turkey

[3] Univ Illinois, Dept Comp Sci, Chicago, IL USA

来源：

ACM TRANSACTIONS ON THE WEB | 2010年 / 4卷 / 02期

关键词：

Algorithms;

D O I：

10.1145/1734200.1734203

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Consider a database of time-series, where each datapoint in the series records the total number of users who asked for a specific query at an internet search engine. Storage and analysis of such logs can be very beneficial for a search company from multiple perspectives. First, from a data organization perspective, because query Weblogs capture important trends and statistics, they can help enhance and optimize the search experience (keyword recommendation, discovery of news events). Second, Weblog data can provide an important polling mechanism for the microeconomic aspects of a search engine, since they can facilitate and promote the advertising facet of the search engine (understand what users request and when they request it). Due to the sheer amount of time-series Weblogs, manipulation of the logs in a compressed form is an impeding necessity for fast data processing and compact storage requirements. Here, we explicate how to compute the lower and upper distance bounds on the time-series logs when working directly on their compressed form. Optimal distance estimation means tighter bounds, leading to better candidate selection/elimination and ultimately faster search performance. Our derivation of the optimal distance bounds is based on the careful analysis of the problem using optimization principles. The experimental evaluation suggests a clear performance advantage of the proposed method, compared to previous compression/search techniques. The presented method results in a 10-30% improvement on distance estimations, which in turn leads to 25-80% improvement on the search performance.

引用

页数：28

共 50 条

[1] Semantic Query Answering with Time-Series Graphs
Ferres, Leo
Dumontier, Michel
Villanueva-Rosales, Natalia
2007 11TH IEEE INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE WORKSHOPS, 2007, : 117 - +
[2] Anomaly Detection on Time-series Logs for Industrial Network
Chen, Lin
Kuang, Xiaoyun
Xu, Aidong
Suo, Siliang
Yang, Yiwei
2020 3RD INTERNATIONAL CONFERENCE ON SMART BLOCKCHAIN (SMARTBLOCK), 2020, : 81 - 86
[3] Processing Encrypted and Compressed Time-Series Data
Harvan, Matus
Kimoto, Samuel
Locher, Thomas
Pignolet, Yvonne Anne
Schneider, Johannes
2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 1053 - 1062
[4] Time dependent approach for query and url recommendations using search engine query logs
Umagandhi, R. (umakongunadu@gmail.com), 1600, International Association of Engineers (40):
[5] Nonparametric Risk Bounds for Time-Series Forecasting
McDonald, Daniel J.
Shalizi, Cosma Rohila
Schervish, Mark
JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18 : 1 - 40
[6] APPLICATIONS OF FAST ORTHOGONAL SEARCH - TIME-SERIES ANALYSIS AND RESOLUTION OF SIGNALS IN NOISE
KORENBERG, MJ
PAARMANN, LD
ANNALS OF BIOMEDICAL ENGINEERING, 1989, 17 (03) : 219 - 231
[7] SEARCH FOR RHYTHMICITY IN BIOLOGICAL TIME-SERIES
ENRIGHT, JT
JOURNAL OF THEORETICAL BIOLOGY, 1965, 8 (03) : 426 - &
[8] Fast and Accurate Time-Series Clustering
Paparrizos, John
Gravano, Luis
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2017, 42 (02):
[9] Generalized time-series active search with Kullback-Leibler distance for audio fingerprinting
Lin, Hui
Ou, Zhijian
Xiao, Xi
IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (08) : 465 - 468
[10] OPTIMAL TIME-SERIES SELECTION OF QUASARS
Butler, Nathaniel R.
Bloom, Joshua S.
ASTRONOMICAL JOURNAL, 2011, 141 (03):

← 1 2 3 4 5 →