Discovery of Genuine Functional Dependencies from Relational Data with Missing Values

被引:30
|
作者
Berti-Equille, Laure [1 ]
Harmouch, Nazar [2 ]
Naumann, Felix [2 ]
Novelli, Noel [1 ]
Saravanan [3 ]
机构
[1] Aix Marseille Univ, CNRS, LIS, Marseille, France
[2] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
[3] HBKU, QCRI, Doha, Qatar
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 11卷 / 08期
关键词
IMPUTATION;
D O I
10.14778/3204028.3204032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce data consistency and to guide repairs over a database. In this work, we investigate the problem of missing values and its impact on FD discovery. When using existing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values or some non-genuine FDs can be discovered even though they are caused by missing values with a certain NULL semantics. We define a notion of genuineness and propose algorithms to compute the genuineness score of a discovered FD. This can be used to identify the genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.
引用
收藏
页码:880 / 892
页数:13
相关论文
共 50 条
  • [1] On discovery of functional dependencies from data
    Liu, Jixue
    Ye, Feiyue
    Li, Jiuyong
    Wang, Junhu
    DATA & KNOWLEDGE ENGINEERING, 2013, 86 : 146 - 159
  • [2] Scalable Functional Dependencies Discovery from Big Data
    Tu Shouzhong
    Huang Minlie
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 426 - 431
  • [3] A functional dependencies checking method in relational data
    Zhong P.
    Li Z.-H.
    Chen Q.
    1600, Science Press (40): : 207 - 222
  • [4] Functional dependencies, from relational to XML
    Liu, JX
    Vincent, M
    Liu, CF
    PERSPECTIVES OF SYSTEM INFORMATICS, 2003, 2890 : 531 - 538
  • [5] FUSAIN: Combining Functional Dependencies and Clustering for Missing Values Imputation
    Wu, Huaiguang
    Li, Shuaichao
    Shi, Wenjun
    Du, Shaoqing
    ENGINEERING LETTERS, 2022, 30 (02) : 513 - 521
  • [6] Discovery of Association Rules from Data including Missing Values
    Sakurai, Shigeaki
    Mori, Kouichirou
    Orihara, Ryohei
    CISIS: 2009 INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, VOLS 1 AND 2, 2009, : 67 - 74
  • [7] Discovery of constraints and data dependencies in relational databases (extended abstract)
    Bell, S
    Brockhausen, P
    MACHINE LEARNING: ECML-95, 1995, 912 : 267 - 270
  • [8] MULTIVALUE DEPENDENCIES WITH NULL VALUES IN RELATIONAL DATA BASES.
    Lien, Y.Edmund
    Journal of the New England Water Pollution Control Association, 1979, : 61 - 66
  • [9] Efficient Discovery of Functional Dependencies on Massive Data
    Wan, Xiaolong
    Han, Xixian
    Wang, Jinbao
    Li, Jianzhong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (01) : 107 - 121
  • [10] Fuzzy functional dependencies and linguistic interpretations employed in knowledge discovery tasks from relational databases
    Vucetic, Miljan
    Hudec, Miroslav
    Bozilovic, Bosko
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 88 (88)