Towards the efficient discovery of meaningful functional dependencies

被引:5
|
作者
Wei, Ziheng [1 ]
Link, Sebastian [1 ]
机构
[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand
关键词
Algorithm; Armstrong relation; Cover; Data profiling; Discovery; Functional dependency; Memory consumption; Missing data; Ranking; Redundancy; Runtime efficiency; RELATIONAL DATA; APPROXIMATE; ALGORITHM; MINIMUM; COVERS;
D O I
10.1016/j.is.2023.102224
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose the first framework for discovering the set of meaningful functional dependencies from data. This set contains the true positives among the set of functional dependencies that hold on the given data. Based on new data structures and original techniques for the dynamic computation of stripped partitions, we devise a new hybridization strategy that results in the first algorithm that can explore trade-offs between runtime efficiency and main memory usage. Using real-world benchmark data, we demonstrate that our algorithm outperforms the previous state-of-the-art in terms of runtime efficiency, and scalability in the number of rows and columns. We propose the number of redundant data values for ranking the functional dependencies that have been discovered. Our ranking helps separate false from true positives for applications, such as schema design. The remaining meaningful functional dependencies consist of the false negatives, that is, those functional dependencies that are only violated by the given data due to data inconsistency. We propose the computation of informative Armstrong relations to draw the attention of users to violations of functional dependencies that are meaningful for some application. We order the pairs of records in Armstrong relations based on the amount of inconsistency and redundancy caused by the associated functional dependencies, thereby pointing the attention to those most likely to be meaningful. As we demonstrate, these samples help separate false from true negatives, their perfect recall of meaningful functional dependencies can lead to a more complete acquisition of requirements and identification of dirty data, and may be computed faster than covers of functional dependencies. In addition, we demonstrate for the first time that non -redundant covers can offer a representation of functional dependencies that is much smaller than left-hand side reduced covers used in previous work. Such a compact representation of the output is easier to understand and explore by humans. We report all our results for different interpretations of missing values. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Efficient Discovery of Ontology Functional Dependencies
    Baskaran, Sridevi
    Keller, Alexander
    Chiang, Fei
    Golab, Lukasz
    Szlichta, Jaroslaw
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1847 - 1856
  • [2] Efficient discovery of functional dependencies and Armstrong relations
    Lopes, S
    Petit, JM
    Lakhal, L
    ADVANCES IN DATABASE TECHNOLOGY-DEBT 2000, PROCEEDINGS, 2000, 1777 : 350 - 364
  • [3] Efficient discovery of functional dependencies with degrees of satisfaction
    Wei, Q
    Chen, GQ
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2004, 19 (11) : 1089 - 1110
  • [4] Efficient Discovery of Functional Dependencies on Massive Data
    Wan, Xiaolong
    Han, Xixian
    Wang, Jinbao
    Li, Jianzhong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (01) : 107 - 121
  • [5] Efficient discovery of functional and approximate dependencies using partitions
    Huhtala, Y
    Karkkainen, J
    Porkka, P
    Toivonen, H
    14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, : 392 - 401
  • [6] Efficient Discovery of Matching Dependencies
    Schirmer, Philipp
    Papenbrock, Thorsten
    Koumarelas, Ioannis
    Naumann, Felix
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2020, 45 (03):
  • [7] Efficient Discovery of Approximate Dependencies
    Kruse, Sebastian
    Naumann, Felix
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (07): : 759 - 772
  • [8] Discovery of Field Functional Dependencies
    Sun, Jizhou
    Li, Jianzhong
    Gao, Hong
    Liu, Xianmin
    2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE), 2015, : 448 - 455
  • [9] Discovery and Ranking of Functional Dependencies
    Wei, Ziheng
    Link, Sebastian
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1526 - 1537
  • [10] Distributed Discovery of Functional Dependencies
    Saxena, Hemant
    Golab, Lukasz
    Ilyas, Ihab F.
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1590 - 1593