Discovery of Genuine Functional Dependencies from Relational Data with Missing Values

被引:30
|
作者
Berti-Equille, Laure [1 ]
Harmouch, Nazar [2 ]
Naumann, Felix [2 ]
Novelli, Noel [1 ]
Saravanan [3 ]
机构
[1] Aix Marseille Univ, CNRS, LIS, Marseille, France
[2] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
[3] HBKU, QCRI, Doha, Qatar
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 08期
关键词
IMPUTATION;
D O I
10.14778/3204028.3204032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.
引用
收藏
页码:880 / 892
页数:13
相关论文
共 50 条
  • [41] Data Dependencies Preserving Shuffle in Relational Database
    Alsuwat, Hatim
    Alsuwat, Emad
    Geng, Tieming
    Huang, Chin-Tser
    Farkas, Csilla
    2019 2ND INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2019), 2019, : 180 - 187
  • [42] Data dependencies over rough relational expressions
    Nakata, M
    Murai, T
    10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 1543 - 1546
  • [43] Appropriate inferences of data dependencies in relational databases
    Joachim Biskup
    Sebastian Link
    Annals of Mathematics and Artificial Intelligence, 2011, 63 : 213 - 255
  • [44] Appropriate inferences of data dependencies in relational databases
    Biskup, Joachim
    Link, Sebastian
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 63 (3-4) : 213 - 255
  • [45] A THEORY OF DATA DEPENDENCIES OVER RELATIONAL EXPRESSIONS
    CASANOVA, MA
    INTERNATIONAL JOURNAL OF COMPUTER & INFORMATION SCIENCES, 1983, 12 (03): : 151 - 191
  • [46] MEMBERSHIP PROBLEMS FOR DATA DEPENDENCIES IN RELATIONAL EXPRESSIONS
    ITO, M
    IWASAKI, M
    TANIGUCHI, K
    KASAMI, T
    THEORETICAL COMPUTER SCIENCE, 1984, 34 (03) : 315 - 335
  • [47] Discovery of "Interesting" data dependencies from a workload of SQL statements
    Lopes, S
    Petit, JM
    Toumani, F
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 430 - 435
  • [49] DPCF: A framework for imputing missing values and clustering data in drug discovery process
    Bhagat, Hutashan Vishal
    Singh, Manminder
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 231