Many are Better than One: Algorithm Selection for Faster Top-K Retrieval

被引:0
|
作者
Tolosa, Gabriel [1 ]
Mallia, Antonio [2 ]
机构
[1] Univ Nacl Lujan, Dept Ciencias Basicas, Buenos Aires, Argentina
[2] NYU, New York, NY USA
基金
美国国家科学基金会;
关键词
Query processing; Web search; Dynamic pruning; Efficiency; SEARCH;
D O I
10.1016/j.ipm.2023.103359
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale search engines have become a fundamental tool to efficiently access information on the Web. Typically, users expect answers in sub-second time frames, which demands highly efficient algorithms to traverse the data structures to return the top-k results. Despite different top-k algorithms that avoid processing all postings for all query terms, finding one algorithm that performs the fastest on any query is not always possible. The fastest average algorithm does not necessarily perform the best on all queries when evaluated on a per-query basis. To overcome this challenge, we propose to combine different state-of-the-art disjunctive top-k query processing algorithms to minimize the execution time by selecting the most promising one for each query. We model the selection step as a classification problem in a machine-learning setup. We conduct extensive experimentation and compare the results against state-of-the-art baselines using standard document collections and query sets. On ClueWeb12, our proposal shows a speed-up of up to 1.20x for non-blocked index organizations and 1.19x for block-based ones. Moreover, tail latencies are reduced showing proportional improvements on average, but a resulting dramatic decrease in latency variance. Given these findings, the proposed approach can be easily applied to existing search infrastructures to speed up query processing and reduce resource consumption, positively impacting providers' operative costs.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Faster Compressed Top-k Document Retrieval
    Hon, Wing-Kai
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 341 - 350
  • [2] Faster Compact Top-k Document Retrieval
    Konow, Roberto
    Navarro, Gonzalo
    2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 351 - 360
  • [3] Faster Top-k Document Retrieval in Optimal Space
    Navarro, Gonzalo
    Thankachan, Sharma V.
    STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE 2013), 2013, 8214 : 255 - 262
  • [4] Faster Top-k Document Retrieval Using Block-Max Indexes
    Ding, Shuai
    Suel, Torsten
    PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 993 - 1002
  • [5] A Top-K Retrieval algorithm based on a decomposition of ranking functions
    Madrid, Nicolas
    Rusnok, Pavel
    INFORMATION SCIENCES, 2019, 474 : 136 - 153
  • [6] Finding the Best of Both Worlds: Faster and More Robust Top-k Document Retrieval
    Khattab, Omar
    Hammoud, Mohammad
    Elsayed, Tamer
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1031 - 1040
  • [7] Efficient Top-K Retrieval with Signatures
    Chappell, Timothy
    Geva, Shlomo
    Anthony Nguyen
    Zuccon, Guido
    PROCEEDINGS OF THE 18TH AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM (ADCS 2013), 2013, : 10 - 17
  • [8] Scalable Top-K Retrieval with Sparta
    Sheffi, Gali
    Basin, Dmitry
    Bortnikov, Edward
    Carmel, David
    Keidar, Idit
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 62 - 73
  • [9] Diversifying Top-k Service Retrieval
    Sha, Chaofeng
    Wang, Keqiang
    Zhang, Kai
    Wang, Xiaoling
    Zhou, Aoying
    2014 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2014), 2014, : 227 - 234
  • [10] Reliable Retrieval of Top-k Tags
    Xu, Yong
    Cheng, Reynold
    Zheng, Yudian
    WEB INFORMATION SYSTEMS ENGINEERING, WISE 2017, PT I, 2017, 10569 : 330 - 346