Optimal algorithms for selecting top-k combinations of attributes: theory and applications

被引:6
|
作者
Lin, Chunbin [1 ]
Lu, Jiaheng [2 ]
Wei, Zhewei [3 ]
Wang, Jianguo [1 ]
Xiao, Xiaokui [4 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
[3] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
来源
VLDB JOURNAL | 2018年 / 27卷 / 01期
基金
芬兰科学院;
关键词
Top-k query; Top-k m query; Instance optimal algorithm; KEYWORD SEARCH; RELATIONAL DATABASES; QUERIES;
D O I
10.1007/s00778-017-0485-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.
引用
收藏
页码:27 / 52
页数:26
相关论文
共 50 条
  • [21] Top-k queries over web applications
    Daniel Deutch
    Tova Milo
    Neoklis Polyzotis
    The VLDB Journal, 2013, 22 : 519 - 542
  • [22] Optimal Enumeration: Efficient Top-k Tree Matching
    Chang, Lijun
    Lin, Xuemin
    Zhang, Wenjie
    Yu, Jeffrey Xu
    Zhang, Ying
    Qin, Lu
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (05): : 533 - 544
  • [23] Selecting the Top-k Discriminative Features Using Principal Component Analysis
    Kane, Aminata
    Shiri, Nematollaah
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 639 - 646
  • [24] Faster Top-k Document Retrieval in Optimal Space
    Navarro, Gonzalo
    Thankachan, Sharma V.
    STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE 2013), 2013, 8214 : 255 - 262
  • [25] TIME-OPTIMAL TOP-k DOCUMENT RETRIEVAL
    Navarro, Gonzalo
    Nekrich, Yakov
    SIAM JOURNAL ON COMPUTING, 2017, 46 (01) : 80 - 113
  • [26] Top-K Query Retrieval of Combinations with Sum-of-Subsets Ranking
    Majumder, Subhashis
    Sanyal, Biswajit
    Gupta, Prosenjit
    Sinha, Soumik
    Pande, Shiladitya
    Hon, Wing-Kai
    COMBINATORIAL OPTIMIZATION AND APPLICATIONS (COCOA 2014), 2014, 8881 : 490 - 505
  • [27] Efficient top-K approximate searches against a relation with multiple attributes
    Lu, Wei
    Chen, Jinchuan
    Du, Xiaoyong
    Wang, Jieping
    Pan, Wei
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2011, 14 (5-6): : 573 - 597
  • [28] Efficient top-K approximate searches against a relation with multiple attributes
    Wei Lu
    Jinchuan Chen
    Xiaoyong Du
    Jieping Wang
    Wei Pan
    World Wide Web, 2011, 14 : 573 - 597
  • [29] Best position algorithms for efficient top-k query processing
    Akbarinia, Reza
    Pacitti, Esther
    Valduriez, Patrick
    INFORMATION SYSTEMS, 2011, 36 (06) : 973 - 989
  • [30] Efficient algorithms of mining top-k frequent closed itemsets
    Lan Yongjie
    Qiu Yong
    ICEMI 2007: PROCEEDINGS OF 2007 8TH INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, VOL II, 2007, : 551 - 554