Optimal algorithms for selecting top-k combinations of attributes: theory and applications

被引:6
|
作者
Lin, Chunbin [1 ]
Lu, Jiaheng [2 ]
Wei, Zhewei [3 ]
Wang, Jianguo [1 ]
Xiao, Xiaokui [4 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
[3] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
来源
VLDB JOURNAL | 2018年 / 27卷 / 01期
基金
芬兰科学院;
关键词
Top-k query; Top-k m query; Instance optimal algorithm; KEYWORD SEARCH; RELATIONAL DATABASES; QUERIES;
D O I
10.1007/s00778-017-0485-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.
引用
收藏
页码:27 / 52
页数:26
相关论文
共 50 条
  • [41] Selecting Materialized Views Based on Top-k Query Algorithm for Lineage Tracing
    Li, Jiyun
    Li, Xin
    Lv, Juntao
    2012 THIRD GLOBAL CONGRESS ON INTELLIGENT SYSTEMS (GCIS 2012), 2012, : 46 - 49
  • [42] Semantic-Aware Top-k Multirequest Optimal Route
    Wang, Shuang
    Xu, Yingchun
    Wang, Yinzhe
    Liu, Hezhi
    Zhang, Qiaoqiao
    Ma, Tiemin
    Liu, Shengnan
    Zhang, Siyuan
    Li, Anliang
    COMPLEXITY, 2019, 2019
  • [43] Optimal Instance Adaptive Algorithm for the Top-K Ranking Problem
    Chen, Xi
    Gopi, Sivakanth
    Mao, Jieming
    Schneider, Jon
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (09) : 6139 - 6160
  • [44] Optimal Top-K Query Evaluation for Weighted Business Processes
    Deutch, Daniel
    Milo, Tova
    Polyzotis, Neoklis
    Yam, Tom
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 940 - 951
  • [45] Evaluating Top-k Algorithms with Various Sources of Data and User Preferences
    Eckhardt, Alan
    Hornicak, Erik
    Vojtas, Peter
    FLEXIBLE QUERY ANSWERING SYSTEMS, 2011, 7022 : 258 - 269
  • [46] Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data
    Amagata, Daichi
    Yamada, Junya
    Ji, Yuchen
    Hara, Takahiro
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 146 - 152
  • [47] Top-k overlapping densest subgraphs: approximation algorithms and computational complexity
    Dondi, Riccardo
    Hosseinzadeh, Mohammad Mehdi
    Mauri, Giancarlo
    Zoppis, Italo
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 2021, 41 (01) : 80 - 104
  • [48] Top-k overlapping densest subgraphs: approximation algorithms and computational complexity
    Riccardo Dondi
    Mohammad Mehdi Hosseinzadeh
    Giancarlo Mauri
    Italo Zoppis
    Journal of Combinatorial Optimization, 2021, 41 : 80 - 104
  • [49] Efficient Top-k Query Processing Algorithms in Highly Distributed Environments
    Fang, Qiming
    Yang, Guangwen
    JOURNAL OF COMPUTERS, 2014, 9 (09) : 2000 - 2006
  • [50] Anytime measures for top-k algorithms on exact and fuzzy data sets
    Arai, Benjamin
    Das, Gautam
    Gunopulos, Dimitrios
    Koudas, Nick
    VLDB JOURNAL, 2009, 18 (02): : 407 - 427