Histograms as statistical estimators for aggregate queries

被引：5

作者：

Chen, Lixia ^{[1
]}

Dobra, Alin ^{[1
]}

机构：

[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA

来源：

INFORMATION SYSTEMS | 2013年 / 38卷 / 02期

基金：

美国国家科学基金会;

关键词：

Histograms; Statistical analysis; Random shuffling assumption; XSKETCH SYNOPSES; ANSWER SIZES; XML;

D O I：

10.1016/j.is.2012.08.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency-this is called uniform distribution assumption. In this paper, we analyze histograms from a statistical point of view. We show that a significantly less restrictive statistical assumption - the elements within a bucket are randomly arranged even though they might have different frequencies - leads to identical formulas for approximating aggregate queries using histograms. Under this assumption, we analyze the behavior of both unidimensional and multidimensional histograms and provide tight error guarantees for the quality of approximations. We conclude that histograms are the best estimators if the assumption holds; sampling and sketching are significantly worse. As an example of how the statistical theory of histograms can be extended, we show how XSketches - an approximation technique for XML queries that uses histograms as building blocks - can be statistically analyzed. The combination of the random shuffling assumption and the other statistical assumptions associated with XSketch estimators ensures a complete statistical model and error analysis for XSketches. Published by Elsevier Ltd.

引用

页码：213 / 230

页数：18

共 50 条

[21] Consistent Answers to Boolean Aggregate Queries under Aggregate Constraints
Flesca, Sergio
Furfaro, Filippo
Parisi, Francesco
DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT 2, 2010, 6262 : 285 - 299
[22] Rewriting aggregate queries using views
Cohen, Sara
Nutt, Werner
Serebrenik, Alexander
Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1999, : 155 - 166
[23] Efficient lineage for SUM aggregate queries
Afrati, Foto N.
Fotakis, Dimitris
Vasilakopoulos, Angelos
AI COMMUNICATIONS, 2015, 28 (04) : 655 - 663
[24] Functional Aggregate Queries with Additive Inequalities
Khamis, Mahmoud Abo
Curtin, Ryan R.
Moseley, Benjamin
Ngo, Hung Q.
Nguyen, Xuanlong
Olteanu, Dan
Schleich, Maximilian
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2020, 45 (04):
[25] Progressive evaluation of nested aggregate queries
Tan, KL
Goh, CH
Ooi, BC
VLDB JOURNAL, 2000, 9 (03): : 261 - 278
[26] Continuous aggregate nearest neighbor queries
Elmongui, Hicham G.
Mokbel, Mohamed F.
Aref, Walid G.
GEOINFORMATICA, 2013, 17 (01) : 63 - 95
[27] Weighted Aggregate Reverse Rank Queries
Dong, Yuyang
Chen, Hanxiong
Yu, Jeffrey Xu
Furuse, Kazutaka
Kitagawa, Hiroyuki
ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2018, 4 (02)
[28] μWheel: Aggregate Management for Streams and Queries
Meldrurn, Max
Carbone, Paris
PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, DEBS 2024, 2024, : 54 - 65
[29] Aggregate Queries Over Conditional Tables
Jens Lechtenbörger
Hua Shu
Gottfried Vossen
Journal of Intelligent Information Systems, 2002, 19 : 343 - 362
[30] Skyline Path Queries With Aggregate Attributes
Chen, Yi-Chung
Lee, Chiang
IEEE ACCESS, 2016, 4 : 4690 - 4706

← 1 2 3 4 5 →