Discovery of Genuine Functional Dependencies from Relational Data with Missing Values

被引:30
|
作者
Berti-Equille, Laure [1 ]
Harmouch, Nazar [2 ]
Naumann, Felix [2 ]
Novelli, Noel [1 ]
Saravanan [3 ]
机构
[1] Aix Marseille Univ, CNRS, LIS, Marseille, France
[2] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
[3] HBKU, QCRI, Doha, Qatar
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 08期
关键词
IMPUTATION;
D O I
10.14778/3204028.3204032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.
引用
收藏
页码:880 / 892
页数:13
相关论文
共 50 条
  • [31] Mining relaxed functional dependencies from data
    Loredana Caruccio
    Vincenzo Deufemia
    Giuseppe Polese
    Data Mining and Knowledge Discovery, 2020, 34 : 443 - 477
  • [32] Mining relaxed functional dependencies from data
    Caruccio, Loredana
    Deufemia, Vincenzo
    Polese, Giuseppe
    DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (02) : 443 - 477
  • [33] Discovery of Temporal Graph Functional Dependencies
    Noronha, Levin
    Chiang, Fei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3348 - 3352
  • [34] Algorithms for the discovery of embedded functional dependencies
    Wei, Ziheng
    Hartmann, Sven
    Link, Sebastian
    VLDB JOURNAL, 2021, 30 (06): : 1069 - 1093
  • [35] Algorithms for the discovery of embedded functional dependencies
    Ziheng Wei
    Sven Hartmann
    Sebastian Link
    The VLDB Journal, 2021, 30 : 1069 - 1093
  • [36] Incremental Discovery of Imprecise Functional Dependencies
    Caruccio, Loredana
    Cirillo, Stefano
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2020, 12 (04):
  • [37] Discovery Algorithms for Embedded Functional Dependencies
    Wei, Ziheng
    Hartmann, Sven
    Link, Sebastian
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 833 - 843
  • [38] Efficient Discovery of Ontology Functional Dependencies
    Baskaran, Sridevi
    Keller, Alexander
    Chiang, Fei
    Golab, Lukasz
    Szlichta, Jaroslaw
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1847 - 1856
  • [39] MISSING VALUES IN DATA
    RACKLEY, K
    SIAM REVIEW, 1974, 16 (01) : 136 - 136
  • [40] A relational data model with fuzzy inheritance dependencies
    Liu, WY
    FUZZY SETS AND SYSTEMS, 1997, 89 (02) : 205 - 213