Towards the efficient discovery of meaningful functional dependencies

被引:5
|
作者
Wei, Ziheng [1 ]
Link, Sebastian [1 ]
机构
[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand
关键词
Algorithm; Armstrong relation; Cover; Data profiling; Discovery; Functional dependency; Memory consumption; Missing data; Ranking; Redundancy; Runtime efficiency; RELATIONAL DATA; APPROXIMATE; ALGORITHM; MINIMUM; COVERS;
D O I
10.1016/j.is.2023.102224
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose the first framework for discovering the set of meaningful functional dependencies from data. This set contains the true positives among the set of functional dependencies that hold on the given data. Based on new data structures and original techniques for the dynamic computation of stripped partitions, we devise a new hybridization strategy that results in the first algorithm that can explore trade-offs between runtime efficiency and main memory usage. Using real-world benchmark data, we demonstrate that our algorithm outperforms the previous state-of-the-art in terms of runtime efficiency, and scalability in the number of rows and columns. We propose the number of redundant data values for ranking the functional dependencies that have been discovered. Our ranking helps separate false from true positives for applications, such as schema design. The remaining meaningful functional dependencies consist of the false negatives, that is, those functional dependencies that are only violated by the given data due to data inconsistency. We propose the computation of informative Armstrong relations to draw the attention of users to violations of functional dependencies that are meaningful for some application. We order the pairs of records in Armstrong relations based on the amount of inconsistency and redundancy caused by the associated functional dependencies, thereby pointing the attention to those most likely to be meaningful. As we demonstrate, these samples help separate false from true negatives, their perfect recall of meaningful functional dependencies can lead to a more complete acquisition of requirements and identification of dirty data, and may be computed faster than covers of functional dependencies. In addition, we demonstrate for the first time that non -redundant covers can offer a representation of functional dependencies that is much smaller than left-hand side reduced covers used in previous work. Such a compact representation of the output is easier to understand and explore by humans. We report all our results for different interpretations of missing values. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Database mining for the discovery of extended functional dependencies
    Bosc, P
    Pivert, O
    Ughetto, L
    18TH INTERNATIONAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1999, : 580 - 584
  • [22] Incremental discovery of functional dependencies using partitions
    Wang, SL
    Shen, JW
    Hong, TP
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 1322 - 1326
  • [23] Approximate Discovery of Functional Dependencies for Large Datasets
    Bleifuss, Tobias
    Buelow, Susanne
    Frohnhofen, Johannes
    Risch, Julian
    Wiese, Georg
    Kruse, Sebastian
    Papenbrock, Thorsten
    Naumann, Felix
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1803 - 1812
  • [24] Empirical evidence for the usefulness of Armstrong relations in the acquisition of meaningful functional dependencies
    Langeveldt, Warren-Dean
    Link, Sebastian
    INFORMATION SYSTEMS, 2010, 35 (03) : 352 - 374
  • [25] Towards relational inconsistent databases with functional dependencies
    Greco, Sergio
    Molinaro, Cristian
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2008, 5178 : 695 - 702
  • [26] Towards declarative comparabilities: Application to functional dependencies
    Nourine, Lhouari
    Petit, Jean-Marc
    Vilmin, Simon
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2024, 146
  • [27] Discovery and Application of Functional Dependencies in Conjunctive Query Mining
    Goethals, Bart
    Laurent, Dominique
    Le Page, Wim
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, 2010, 6263 : 142 - +
  • [28] Functional genomics approaches to the discovery of paralog dependencies in cancer
    Sellers, William
    MOLECULAR CANCER THERAPEUTICS, 2019, 18 (12)
  • [29] Scalable Functional Dependencies Discovery from Big Data
    Tu Shouzhong
    Huang Minlie
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 426 - 431
  • [30] An SQL Data Summarization Tool for the Acquisition of Meaningful Cardinality Constraints and Functional Dependencies
    Gandhi, Aniruddh
    Kohler, Henning
    Hartmann, Sven
    Link, Sebastian
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1549 - 1552