Cardinality estimation using normalizing flow

被引:2
|
作者
Wang, Jiayi [1 ]
Chai, Chengliang [2 ]
Liu, Jiabin [1 ]
Li, Guoliang [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Beijing Inst Technol, Dept Comp Sci & Technol, Beijing, Peoples R China
来源
VLDB JOURNAL | 2024年 / 33卷 / 02期
关键词
Cardinality estimation; Query optimization; AI for DB; PREDICTION;
D O I
10.1007/s00778-023-00808-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cardinality estimation is one of the most important problems in query optimization. Recently, machine learning-based techniques have been proposed to effectively estimate cardinality, which can be broadly classified into query-driven and data-driven approaches. Query-driven approaches learn a regression model from a query to its cardinality, while data-driven approaches learn a distribution of tuples, select some samples that satisfy a SQL query, and use the data distributions of these selected tuples to estimate the cardinality of the SQL query. As query-driven methods rely on training queries, the estimation quality is not reliable when there are no high-quality training queries, while data-driven methods have no such limitation and have high adaptivity. In this work, we focus on data-driven methods. A good data-driven model should achieve three optimization goals. First, the model needs to capture data dependencies between columns and support large domain sizes (achieving high accuracy). Second, the model should achieve high inference efficiency, because many data samples are needed to estimate the cardinality (achieving low inference latency). Third, the model should not be too large (achieving a small model size). However, existing data-driven methods cannot simultaneously optimize the three goals. To address the limitations, we propose a novel cardinality estimator FACE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\texttt{FACE}$$\end{document}, which leverages the normalizing flow-based model to learn a continuous joint distribution for relational data. FACE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\texttt{FACE}$$\end{document} can transform a complex distribution over continuous random variables into a simple distribution (e.g., multivariate normal distribution) and use the probability density to estimate the cardinality for both sequential queries and parallel queries. First, we design a dequantization method to make data more "continuous." Second, we propose encoding and indexing techniques to handle Like predicates for string data. Third, we propose a Monte Carlo method to estimate the cardinality based on the FACE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\texttt{FACE}$$\end{document} model. Fourth, we propose a grouping technique to process parallel queries. Fifth, we discuss how to support join queries. Experimental results show that our method significantly outperforms existing approaches in terms of estimation accuracy while keeping similar latency and model size.
引用
收藏
页码:323 / 348
页数:26
相关论文
共 50 条
  • [1] Cardinality estimation using normalizing flow
    Jiayi Wang
    Chengliang Chai
    Jiabin Liu
    Guoliang Li
    The VLDB Journal, 2024, 33 (2) : 323 - 348
  • [2] FACE: A Normalizing Flow based Cardinality Estimator
    Wang, Jiayi
    Chai, Chengliang
    Liu, Jiabin
    Li, Guoliang
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 15 (01): : 72 - 84
  • [3] Normalizing Cardinality Rules Using Merging and Sorting Constructions
    Bomanson, Jori
    Janhunen, Tomi
    LOGIC PROGRAMMING AND NONMONOTONIC REASONING (LPNMR 2013), 2013, 8148 : 187 - 199
  • [4] Understanding Cardinality Estimation Using Entropy Maximization
    Re, Christopher
    Suciu, Dan
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2012, 37 (01):
  • [5] Networks cardinality estimation using order statistics
    Lucchese, Riccardo
    Varagnolo, Damiano
    2015 AMERICAN CONTROL CONFERENCE (ACC), 2015, : 3810 - 3817
  • [6] Understanding Cardinality Estimation using Entropy Maximization
    Re, Christopher
    Suciu, Dan
    PODS 2010: PROCEEDINGS OF THE TWENTY-NINTH ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2010, : 53 - 64
  • [7] Anomaly Detection Using Normalizing Flow-Based Density Estimation and Synthetic Defect Classification
    Oh, Seungmi
    Kim, Jeongtae
    IEEE ACCESS, 2024, 12 : 75873 - 75887
  • [8] Normalizing flow based uncertainty estimation for deep regression analysis
    Zhang, Baobing
    Sui, Wanxin
    Huang, Zhengwen
    Li, Maozhen
    Qi, Man
    NEUROCOMPUTING, 2024, 585
  • [9] Per-Flow Cardinality Estimation Based On Virtual LogLog Sketching
    Zhou, Zeyu
    Hajek, Bruce
    2019 53RD ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2019,
  • [10] Cardinality Estimation of Approximate Substring Queries using Deep Learning
    Kwon, Suyong
    Jung, Woohwan
    Shim, Kyuseok
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 3145 - 3157