Improving estimation accuracy of aggregate queries on data cubes

被引:4
|
作者
Pourabbas, E. [1 ]
Shoshani, A. [2 ]
机构
[1] Ist Anal Sistemi & Informat Antonio Ruberti, Italian Natl Res Council, I-00185 Rome, Italy
[2] Univ Calif Berkeley, Lawrence Berkeley Lab, Berkeley, CA 94720 USA
关键词
Query estimation; Entropy; Accuracy analysis;
D O I
10.1016/j.datak.2009.08.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the problem of estimation of a target database from summary databases derived from a base data cube. We show that such estimates can be derived by choosing a primary database with the desired target measure but not the desired dimensions, and use a proxy database to estimate the results. This technique is common in statistics, but an important issue we are addressing is the accuracy of these estimates. Specifically, given multiple primary and multiple proxy databases, the problem is how to select the primary and proxy databases that will generate the most accurate target database estimation possible. We propose an algorithmic approach which makes use of the principles of information entropy for determining the steps to select or compute the primary and proxy databases that provide the most accurate target database. We show that the primary database with the largest number of cells in common with the target database and the proxy database provides the more accurate estimates. We prove that this is consistent with maximizing the entropy. We provide some experimental results on the accuracy of the target database estimation in order to verify our results. Furthermore, we investigate the accuracy results in cases where the dimensions are defined over a hierarchy of categories and roll-up and drill-down operations are needed to generate the desired target results. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:50 / 72
页数:23
相关论文
共 50 条
  • [1] Interval Estimation for Aggregate Queries on Incomplete Data
    Zhang, An-Zhen
    Li, Jian-Zhong
    Gao, Hong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2019, 34 (06) : 1203 - 1216
  • [2] Interval Estimation for Aggregate Queries on Incomplete Data
    An-Zhen Zhang
    Jian-Zhong Li
    Hong Gao
    Journal of Computer Science and Technology, 2019, 34 : 1203 - 1216
  • [3] Answering approximate range aggregate queries on OLAP data cubes with probabilistic guarantees
    Cuzzocrea, A
    Wang, W
    Matrangolo, U
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2004, 3181 : 97 - 107
  • [4] Range queries in dynamic OLAP data cubes
    Liang, WF
    Wang, H
    Orlowska, ME
    DATA & KNOWLEDGE ENGINEERING, 2000, 34 (01) : 21 - 38
  • [5] Improving the accuracy of mineral aggregate surface energy estimation based on goniometry
    Tencio, Daybis
    Baldi, Alejandra
    Aguiar-Moya, Jose P.
    Elizondo-Salas, Ana-Luisa
    ROAD MATERIALS AND PAVEMENT DESIGN, 2023, 24 (03) : 744 - 760
  • [6] Range sum queries in dynamic OLAP data cubes
    Li, HG
    Ling, TW
    Lee, SY
    Loh, ZX
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON COOPERATIVE DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2000, : 74 - 81
  • [7] A Join-Like Operator to Combine Data Cubes and Answer Queries from Multiple Data Cubes
    Malvestuto, Francesco M.
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2014, 39 (03):
  • [8] Improving the performance of aggregate queries with cached tuples in mapReduce
    Peng, Dunlu
    Duan, Kai
    Xie, Lei
    International Journal of Database Theory and Application, 2013, 6 (01): : 13 - 24
  • [9] Efficient Aggregate Queries on Location Data with Confidentiality
    Feng, Da
    Zhou, Fucai
    Wang, Qiang
    Wu, Qiyu
    Li, Bao
    SENSORS, 2022, 22 (13)
  • [10] The Semantics of Aggregate Queries in Data Exchange Revisited
    Kolaitis, Phokion G.
    Spezzano, Francesca
    SCALABLE UNCERTAINTY MANAGEMENT, SUM 2013, 2013, 8078 : 233 - 246