Querying Incomplete Numerical Data: Between Certain and Possible Answers

被引:0
|
作者
Console, Marco [1 ]
Libkin, Leonid [2 ,3 ,4 ]
Peterfreund, Liat [5 ]
机构
[1] Sapienza Univ Rome, Rome, Italy
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[3] PSL Univ, RelationalAI, Paris, France
[4] PSL Univ, ENS, Paris, France
[5] Univ Gustave Eiffel, LIGM, CNRS, Champs Sur Marne, France
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”;
关键词
Nulls; numerical attributes; aggregate queries; probabilistic databases; approximations; certain and possible answers; LANGUAGES;
D O I
10.1145/3584372.3588660
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Queries with aggregation and arithmetic operations, as well as incomplete data, are common in real-world database, but we lack a good understanding of how they should interact. On the one hand, systems based on SQL provide ad-hoc rules for numerical nulls, on the other, theoretical research largely concentrates on the standard notions of certain and possible answers. In the presence of numerical attributes and aggregates, however, these answers are often meaningless, returning either too little or too much. Our goal is to define a principled framework for databases with numerical nulls and answering queries with arithmetic and aggregations over them. Towards this goal, we assume that missing values in numerical attributes are given by probability distributions associated with marked nulls. This yields a model of probabilistic bag databases in which tuples are not necessarily independent since nulls can repeat. We provide a general compositional framework for query answering and then concentrate on queries that resemble standard SQL with arithmetic and aggregation. We show that these queries are measurable, and their outputs have a finite representation. Moreover, since the classical forms of answers provide little information in the numerical setting, we look at the probability that numerical values in output tuples belong to specific intervals. Even though their exact computation is intractable, we show efficient approximation algorithms to compute such probabilities.
引用
收藏
页码:349 / 358
页数:10
相关论文
共 50 条
  • [1] Computing possible and certain answers over order-incomplete data
    Amarilli, Antoine
    Ba, Mouhamadou Lamine
    Deutch, Daniel
    Senellart, Pierre
    THEORETICAL COMPUTER SCIENCE, 2019, 797 : 42 - 76
  • [2] Querying incomplete information in semistructured data
    Kanza, Y
    Nutt, W
    Sagiv, Y
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2002, 64 (03) : 655 - 693
  • [3] Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data
    Feng, Su
    Glavic, Boris
    Kennedy, Oliver
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (06): : 1346 - 1358
  • [4] Querying incomplete data over extended ER schemata
    Cali, Andrea
    Martinenghi, Davide
    THEORY AND PRACTICE OF LOGIC PROGRAMMING, 2010, 10 : 291 - 329
  • [5] Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions
    Karlas, Bojan
    Li, Peng
    Wu, Renzhi
    Gurel, Nezihe Merve
    Chu, Xu
    Wu, Wentao
    Zhang, Ce
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (03): : 255 - 267
  • [6] Certain Answers over Incomplete XML Documents: Extending Tractability Boundary
    Gheerbrant, Amelie
    Libkin, Leonid
    THEORY OF COMPUTING SYSTEMS, 2015, 57 (04) : 892 - 926
  • [7] Optimizing the Computation of Approximate Certain Query Answers over Incomplete Databases
    Fiorentino, Nicola
    Molinar, Cristian
    Trubitsyna, Irina
    FLEXIBLE QUERY ANSWERING SYSTEMS, 2019, 11529 : 48 - 60
  • [8] Certain Answers over Incomplete XML Documents: Extending Tractability Boundary
    Amélie Gheerbrant
    Leonid Libkin
    Theory of Computing Systems, 2015, 57 : 892 - 926
  • [9] Querying incomplete data with logic programs:: ER strikes back
    Cali, Andrea
    CONCEPTUAL MODELING - ER 2007, PROCEEDINGS, 2007, 4801 : 245 - 260
  • [10] ACID: A System for Computing Approximate Certain Query Answers over Incomplete Databases
    Fiorentino, Nicola
    Greco, Sergio
    Molinaro, Cristian
    Trubitsyna, Irina
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1685 - 1688