Histograms as statistical estimators for aggregate queries

被引:5
|
作者
Chen, Lixia [1 ]
Dobra, Alin [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
Histograms; Statistical analysis; Random shuffling assumption; XSKETCH SYNOPSES; ANSWER SIZES; XML;
D O I
10.1016/j.is.2012.08.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency-this is called uniform distribution assumption. In this paper, we analyze histograms from a statistical point of view. We show that a significantly less restrictive statistical assumption - the elements within a bucket are randomly arranged even though they might have different frequencies - leads to identical formulas for approximating aggregate queries using histograms. Under this assumption, we analyze the behavior of both unidimensional and multidimensional histograms and provide tight error guarantees for the quality of approximations. We conclude that histograms are the best estimators if the assumption holds; sampling and sketching are significantly worse. As an example of how the statistical theory of histograms can be extended, we show how XSketches - an approximation technique for XML queries that uses histograms as building blocks - can be statistically analyzed. The combination of the random shuffling assumption and the other statistical assumptions associated with XSketch estimators ensures a complete statistical model and error analysis for XSketches. Published by Elsevier Ltd.
引用
收藏
页码:213 / 230
页数:18
相关论文
共 50 条
  • [31] Packing and depacking histograms with statistical processing
    Louvel, S
    Chamayou, JF
    COMPUTER PHYSICS COMMUNICATIONS, 1996, 93 (2-3) : 289 - 302
  • [32] Securing Aggregate Queries for DNA Databases
    Nassar, Mohamed
    Malluhi, Qutaibah
    Atallah, Mikhail
    Shikfa, Abdullatif
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2019, 7 (03) : 827 - 837
  • [33] Aggregate queries over conditional tables
    Lechtenbörger, J
    Shu, H
    Vossen, G
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2002, 19 (03) : 343 - 362
  • [34] On Functional Aggregate Queries with Additive Inequalities
    Khamis, Mahmoud Abo
    Curtin, Ryan R.
    Moseley, Benjamin
    Ngo, Hung Q.
    Nguyen, XuanLong
    Olteanu, Dan
    Schleich, Maximilian
    PROCEEDINGS OF THE 38TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS (PODS '19), 2019, : 414 - 431
  • [35] Using Histograms to Better Answer Queries to Probabilistic Logic Programs
    Broecheler, Matthias
    Simari, Gerardo I.
    Subrahmanian, V. S.
    LOGIC PROGRAMMING, 2009, 5649 : 40 - 54
  • [36] Aggregate Queries in Wireless Sensor Networks
    Kim, Jeong-Joon
    Shin, In-Su
    Zhang, Yan-Sheng
    Kim, Dong-Oh
    Han, Ki-Joon
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2012,
  • [37] Parallel evaluation of composite aggregate queries
    Chen, Lei
    Olston, Christopher
    Ramakrishnan, Raghu
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 218 - +
  • [38] Continuous aggregate nearest neighbor queries
    Hicham G. Elmongui
    Mohamed F. Mokbel
    Walid G. Aref
    GeoInformatica, 2013, 17 : 63 - 95
  • [39] Progressive evaluation of nested aggregate queries
    Kian-Lee Tan
    Cheng Hian Goh
    Beng Chin Ooi
    The VLDB Journal, 2000, 9 : 261 - 278
  • [40] On using extended statistical queries to avoid membership queries
    Bshouty, NH
    Feldman, V
    JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) : 359 - 395