Improving estimation accuracy of aggregate queries on data cubes

被引:4
|
作者
Pourabbas, E. [1 ]
Shoshani, A. [2 ]
机构
[1] Ist Anal Sistemi & Informat Antonio Ruberti, Italian Natl Res Council, I-00185 Rome, Italy
[2] Univ Calif Berkeley, Lawrence Berkeley Lab, Berkeley, CA 94720 USA
关键词
Query estimation; Entropy; Accuracy analysis;
D O I
10.1016/j.datak.2009.08.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the problem of estimation of a target database from summary databases derived from a base data cube. We show that such estimates can be derived by choosing a primary database with the desired target measure but not the desired dimensions, and use a proxy database to estimate the results. This technique is common in statistics, but an important issue we are addressing is the accuracy of these estimates. Specifically, given multiple primary and multiple proxy databases, the problem is how to select the primary and proxy databases that will generate the most accurate target database estimation possible. We propose an algorithmic approach which makes use of the principles of information entropy for determining the steps to select or compute the primary and proxy databases that provide the most accurate target database. We show that the primary database with the largest number of cells in common with the target database and the proxy database provides the more accurate estimates. We prove that this is consistent with maximizing the entropy. We provide some experimental results on the accuracy of the target database estimation in order to verify our results. Furthermore, we investigate the accuracy results in cases where the dimensions are defined over a hierarchy of categories and roll-up and drill-down operations are needed to generate the desired target results. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:50 / 72
页数:23
相关论文
共 50 条
  • [31] Accuracy vs. lifetime: Linear sketches for aggregate queries in sensor networks
    Puttagunta, Vasundhara
    Kalpakis, Konstantinos
    ALGORITHMICA, 2007, 49 (04) : 357 - 385
  • [32] Aggregate Farthest-Neighbor Queries over Spatial Data
    Gao, Yuan
    Shou, Lidan
    Chen, Ke
    Chen, Gang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT II, 2011, 6588 : 149 - 163
  • [33] Processing aggregate queries with materialized views in data warehouse environment
    Chang, JY
    Kim, HJ
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (04): : 726 - 738
  • [34] A Neural Database for Answering Aggregate Queries on Incomplete Relational Data
    Zeighami, Sepanta
    Seshadri, Raghav
    Shahabi, Cyrus
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 2790 - 2802
  • [35] Capturing Continuous Data and Answering Aggregate Queries in Probabilistic XML
    Abiteboul, Serge
    Chan, T. -H. Hubert
    Kharlamov, Evgeny
    Nutt, Werner
    Senellart, Pierre
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2011, 36 (04):
  • [36] Containment of aggregate queries
    Cohen, S
    Nutt, W
    Sagiv, Y
    DATABASE THEORY ICDT 2003, PROCEEDINGS, 2003, 2572 : 111 - 125
  • [37] Containment of aggregate queries
    Cohen, S
    SIGMOD RECORD, 2005, 34 (01) : 77 - 85
  • [38] Efficient execution of parallel aggregate data cube queries in data warehouse environments
    Tan, RBN
    Taniar, D
    Lu, G
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 709 - 716
  • [39] Data augmentation method for improving the accuracy of human pose estimation with cropped images
    Park, Soonchan
    Lee, Sang-baek
    Park, Jinah
    PATTERN RECOGNITION LETTERS, 2020, 136 : 244 - 250
  • [40] Answering ad hoc aggregate queries from data streams using prefix aggregate trees
    Moonjung Cho
    Jian Pei
    Ke Wang
    Knowledge and Information Systems, 2007, 12 : 301 - 329