Many are Better than One: Algorithm Selection for Faster Top-K Retrieval

被引：0

作者：

Tolosa, Gabriel ^{[1
]}

Mallia, Antonio ^{[2
]}

机构：

[1] Univ Nacl Lujan, Dept Ciencias Basicas, Buenos Aires, Argentina

[2] NYU, New York, NY USA

来源：

INFORMATION PROCESSING & MANAGEMENT | 2023年 / 60卷 / 04期

基金：

美国国家科学基金会;

关键词：

Query processing; Web search; Dynamic pruning; Efficiency; SEARCH;

D O I：

10.1016/j.ipm.2023.103359

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large-scale search engines have become a fundamental tool to efficiently access information on the Web. Typically, users expect answers in sub-second time frames, which demands highly efficient algorithms to traverse the data structures to return the top-k results. Despite different top-k algorithms that avoid processing all postings for all query terms, finding one algorithm that performs the fastest on any query is not always possible. The fastest average algorithm does not necessarily perform the best on all queries when evaluated on a per-query basis. To overcome this challenge, we propose to combine different state-of-the-art disjunctive top-k query processing algorithms to minimize the execution time by selecting the most promising one for each query. We model the selection step as a classification problem in a machine-learning setup. We conduct extensive experimentation and compare the results against state-of-the-art baselines using standard document collections and query sets. On ClueWeb12, our proposal shows a speed-up of up to 1.20x for non-blocked index organizations and 1.19x for block-based ones. Moreover, tail latencies are reduced showing proportional improvements on average, but a resulting dramatic decrease in latency variance. Given these findings, the proposed approach can be easily applied to existing search infrastructures to speed up query processing and reduce resource consumption, positively impacting providers' operative costs.

引用

页数：26

共 50 条

[1] Faster Compressed Top-k Document Retrieval
Hon, Wing-Kai
Shah, Rahul
Thankachan, Sharma V.
Vitter, Jeffrey Scott
2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 341 - 350
[2] Faster Compact Top-k Document Retrieval
Konow, Roberto
Navarro, Gonzalo
2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 351 - 360
[3] Faster Top-k Document Retrieval in Optimal Space
Navarro, Gonzalo
Thankachan, Sharma V.
STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE 2013), 2013, 8214 : 255 - 262
[4] Faster Top-k Document Retrieval Using Block-Max Indexes
Ding, Shuai
Suel, Torsten
PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 993 - 1002
[5] A Top-K Retrieval algorithm based on a decomposition of ranking functions
Madrid, Nicolas
Rusnok, Pavel
INFORMATION SCIENCES, 2019, 474 : 136 - 153
[6] Finding the Best of Both Worlds: Faster and More Robust Top-k Document Retrieval
Khattab, Omar
Hammoud, Mohammad
Elsayed, Tamer
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1031 - 1040
[7] Efficient Top-K Retrieval with Signatures
Chappell, Timothy
Geva, Shlomo
Anthony Nguyen
Zuccon, Guido
PROCEEDINGS OF THE 18TH AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM (ADCS 2013), 2013, : 10 - 17
[8] Scalable Top-K Retrieval with Sparta
Sheffi, Gali
Basin, Dmitry
Bortnikov, Edward
Carmel, David
Keidar, Idit
PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 62 - 73
[9] Diversifying Top-k Service Retrieval
Sha, Chaofeng
Wang, Keqiang
Zhang, Kai
Wang, Xiaoling
Zhou, Aoying
2014 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2014), 2014, : 227 - 234
[10] Reliable Retrieval of Top-k Tags
Xu, Yong
Cheng, Reynold
Zheng, Yudian
WEB INFORMATION SYSTEMS ENGINEERING, WISE 2017, PT I, 2017, 10569 : 330 - 346

← 1 2 3 4 5 →