Histograms as statistical estimators for aggregate queries

被引:5
|
作者
Chen, Lixia [1 ]
Dobra, Alin [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
Histograms; Statistical analysis; Random shuffling assumption; XSKETCH SYNOPSES; ANSWER SIZES; XML;
D O I
10.1016/j.is.2012.08.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency-this is called uniform distribution assumption. In this paper, we analyze histograms from a statistical point of view. We show that a significantly less restrictive statistical assumption - the elements within a bucket are randomly arranged even though they might have different frequencies - leads to identical formulas for approximating aggregate queries using histograms. Under this assumption, we analyze the behavior of both unidimensional and multidimensional histograms and provide tight error guarantees for the quality of approximations. We conclude that histograms are the best estimators if the assumption holds; sampling and sketching are significantly worse. As an example of how the statistical theory of histograms can be extended, we show how XSketches - an approximation technique for XML queries that uses histograms as building blocks - can be statistically analyzed. The combination of the random shuffling assumption and the other statistical assumptions associated with XSketch estimators ensures a complete statistical model and error analysis for XSketches. Published by Elsevier Ltd.
引用
收藏
页码:213 / 230
页数:18
相关论文
共 50 条
  • [1] STATISTICAL ESTIMATORS FOR AGGREGATE RELATIONAL ALGEBRA QUERIES
    HOU, WC
    OZSOYOGLU, GK
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 1991, 16 (04): : 600 - 654
  • [2] A probabilistic framework for estimating the accuracy of aggregate range queries evaluated over histograms
    Buccafurri, Francesco
    Furfaro, Filippo
    Sacca, Domenico
    INFORMATION SCIENCES, 2012, 188 : 121 - 150
  • [3] A comparison of statistical relational learning and graph neural networks for aggregate graph queries
    Embar, Varun
    Srinivasan, Sriram
    Getoor, Lise
    MACHINE LEARNING, 2021, 110 (07) : 1847 - 1866
  • [4] A comparison of statistical relational learning and graph neural networks for aggregate graph queries
    Varun Embar
    Sriram Srinivasan
    Lise Getoor
    Machine Learning, 2021, 110 : 1847 - 1866
  • [5] Containment of aggregate queries
    Cohen, S
    Nutt, W
    Sagiv, Y
    DATABASE THEORY ICDT 2003, PROCEEDINGS, 2003, 2572 : 111 - 125
  • [6] Containment of aggregate queries
    Cohen, S
    SIGMOD RECORD, 2005, 34 (01) : 77 - 85
  • [7] Calibrating aggregate travel demand models with traffic counts: Estimators and statistical performance
    Ennio Cascetta
    Francesco Russo
    Transportation, 1997, 24 : 271 - 293
  • [8] Calibrating aggregate travel demand models with traffic counts: Estimators and statistical performance
    Cascetta, E
    Russo, F
    TRANSPORTATION, 1997, 24 (03) : 271 - 293
  • [9] AGGREGATE CERTAINTY ESTIMATORS
    Monteith, Kristine
    Martinez, Tony
    COMPUTATIONAL INTELLIGENCE, 2013, 29 (02) : 207 - 232