Self-tuning histograms: Building histograms without looking at data

被引:0
|
作者
Aboulnaga, A [1 ]
Chaudhuri, S [1 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we introduce self-tuning histograms. Although similar in structure to traditional histograms, these histograms infer data distributions not by examining the data or a sample thereof, but by using feedback from the query execution engine about the actual selectivity of range selection operators to progressively refine the histogram. Since the cost of building and maintaining self-tuning histograms is independent of the data size, self-tuning histograms provide a remarkably inexpensive way to construct histograms for large data sets with little up-front costs. Self-tuning histograms are particularly attractive as an alternative to multi-dimensional traditional histograms that capture dependencies between attributes but are prohibitively expensive to build and maintain. In this paper, we describe the techniques for initializing and refining self-tuning histograms. Our experimental results show that self-tuning histograms provide a low-cost alternative to traditional multi-dimensional histograms with little loss of accuracy for data distributions with low to moderate skew.
引用
收藏
页码:181 / 192
页数:12
相关论文
共 50 条
  • [31] Using smoothed data histograms for cluster visualization in Self-Organizing Maps
    Pampalk, E
    Rauber, A
    Merkl, D
    ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 871 - 876
  • [32] Histograms as a Side Effect of Data Movement for Big Data
    Istvan, Zsolt
    Woods, Louis
    Alonso, Gustavo
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1567 - 1578
  • [33] MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS
    Klemela, Jussi
    STATISTICA SINICA, 2009, 19 (01) : 159 - 176
  • [34] On distributed data aggregation and the precision of approximate histograms
    Gotfryd, Karol
    Cichon, Jacek
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 180
  • [35] POSTERIOR IDENTIFICATION OF HISTOGRAMS CONDITIONAL TO LOCAL DATA
    JOURNEL, AG
    XU, WL
    MATHEMATICAL GEOLOGY, 1994, 26 (03): : 323 - 359
  • [36] Uniform Histograms for Change Detection in Multivariate Data
    Boracchi, Giacomo
    Cervellera, Cristiano
    Maccio, Danilo
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1732 - 1739
  • [37] Smoothed histograms for frequency data on irregular intervals
    Scott, David W.
    Scott, Warren R.
    AMERICAN STATISTICIAN, 2008, 62 (03): : 256 - 261
  • [38] Measurement of possibilistic histograms from interval data
    Joslyn, C
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 1997, 26 (1-2) : 9 - 33
  • [39] Constructing fading histograms from data streams
    Sebastiao, Raquel
    Gama, Joao
    Mendonca, Teresa
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2014, 3 (01) : 15 - 28
  • [40] Comparing Data Distribution Using Fading Histograms
    Sebastiao, Raquel
    Gama, Joao
    Mendonca, Teresa
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1095 - +