Histograms as statistical estimators for aggregate queries

被引:5
|
作者
Chen, Lixia [1 ]
Dobra, Alin [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
Histograms; Statistical analysis; Random shuffling assumption; XSKETCH SYNOPSES; ANSWER SIZES; XML;
D O I
10.1016/j.is.2012.08.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency-this is called uniform distribution assumption. In this paper, we analyze histograms from a statistical point of view. We show that a significantly less restrictive statistical assumption - the elements within a bucket are randomly arranged even though they might have different frequencies - leads to identical formulas for approximating aggregate queries using histograms. Under this assumption, we analyze the behavior of both unidimensional and multidimensional histograms and provide tight error guarantees for the quality of approximations. We conclude that histograms are the best estimators if the assumption holds; sampling and sketching are significantly worse. As an example of how the statistical theory of histograms can be extended, we show how XSketches - an approximation technique for XML queries that uses histograms as building blocks - can be statistically analyzed. The combination of the random shuffling assumption and the other statistical assumptions associated with XSketch estimators ensures a complete statistical model and error analysis for XSketches. Published by Elsevier Ltd.
引用
收藏
页码:213 / 230
页数:18
相关论文
共 50 条
  • [21] Consistent Answers to Boolean Aggregate Queries under Aggregate Constraints
    Flesca, Sergio
    Furfaro, Filippo
    Parisi, Francesco
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT 2, 2010, 6262 : 285 - 299
  • [22] Rewriting aggregate queries using views
    Cohen, Sara
    Nutt, Werner
    Serebrenik, Alexander
    Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1999, : 155 - 166
  • [23] Efficient lineage for SUM aggregate queries
    Afrati, Foto N.
    Fotakis, Dimitris
    Vasilakopoulos, Angelos
    AI COMMUNICATIONS, 2015, 28 (04) : 655 - 663
  • [24] Functional Aggregate Queries with Additive Inequalities
    Khamis, Mahmoud Abo
    Curtin, Ryan R.
    Moseley, Benjamin
    Ngo, Hung Q.
    Nguyen, Xuanlong
    Olteanu, Dan
    Schleich, Maximilian
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2020, 45 (04):
  • [25] Progressive evaluation of nested aggregate queries
    Tan, KL
    Goh, CH
    Ooi, BC
    VLDB JOURNAL, 2000, 9 (03): : 261 - 278
  • [26] Continuous aggregate nearest neighbor queries
    Elmongui, Hicham G.
    Mokbel, Mohamed F.
    Aref, Walid G.
    GEOINFORMATICA, 2013, 17 (01) : 63 - 95
  • [27] Weighted Aggregate Reverse Rank Queries
    Dong, Yuyang
    Chen, Hanxiong
    Yu, Jeffrey Xu
    Furuse, Kazutaka
    Kitagawa, Hiroyuki
    ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2018, 4 (02)
  • [28] μWheel: Aggregate Management for Streams and Queries
    Meldrurn, Max
    Carbone, Paris
    PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, DEBS 2024, 2024, : 54 - 65
  • [29] Aggregate Queries Over Conditional Tables
    Jens Lechtenbörger
    Hua Shu
    Gottfried Vossen
    Journal of Intelligent Information Systems, 2002, 19 : 343 - 362
  • [30] Skyline Path Queries With Aggregate Attributes
    Chen, Yi-Chung
    Lee, Chiang
    IEEE ACCESS, 2016, 4 : 4690 - 4706