Towards the efficient discovery of meaningful functional dependencies

被引:5
|
作者
Wei, Ziheng [1 ]
Link, Sebastian [1 ]
机构
[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand
关键词
Algorithm; Armstrong relation; Cover; Data profiling; Discovery; Functional dependency; Memory consumption; Missing data; Ranking; Redundancy; Runtime efficiency; RELATIONAL DATA; APPROXIMATE; ALGORITHM; MINIMUM; COVERS;
D O I
10.1016/j.is.2023.102224
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose the first framework for discovering the set of meaningful functional dependencies from data. This set contains the true positives among the set of functional dependencies that hold on the given data. Based on new data structures and original techniques for the dynamic computation of stripped partitions, we devise a new hybridization strategy that results in the first algorithm that can explore trade-offs between runtime efficiency and main memory usage. Using real-world benchmark data, we demonstrate that our algorithm outperforms the previous state-of-the-art in terms of runtime efficiency, and scalability in the number of rows and columns. We propose the number of redundant data values for ranking the functional dependencies that have been discovered. Our ranking helps separate false from true positives for applications, such as schema design. The remaining meaningful functional dependencies consist of the false negatives, that is, those functional dependencies that are only violated by the given data due to data inconsistency. We propose the computation of informative Armstrong relations to draw the attention of users to violations of functional dependencies that are meaningful for some application. We order the pairs of records in Armstrong relations based on the amount of inconsistency and redundancy caused by the associated functional dependencies, thereby pointing the attention to those most likely to be meaningful. As we demonstrate, these samples help separate false from true negatives, their perfect recall of meaningful functional dependencies can lead to a more complete acquisition of requirements and identification of dirty data, and may be computed faster than covers of functional dependencies. In addition, we demonstrate for the first time that non -redundant covers can offer a representation of functional dependencies that is much smaller than left-hand side reduced covers used in previous work. Such a compact representation of the output is easier to understand and explore by humans. We report all our results for different interpretations of missing values. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Efficient Discovery of Differential Dependencies Through Association Rules Mining
    Kwashie, Selasi
    Liu, Jixue
    Li, Jiuyong
    Ye, Feiyue
    DATABASES THEORY AND APPLICATIONS, 2015, 9093 : 3 - 15
  • [32] Efficient discovery of nonlinear dependencies in a combinatorial catalyst data set
    Cawse, JN
    Baerns, M
    Holena, M
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01): : 143 - 146
  • [33] Provenance-aware Discovery of Functional Dependencies on Integrated Views
    Comignani, Ugo
    Berti-Equille, Laure
    Novelli, Noel
    Bonifati, Angela
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 621 - 633
  • [34] TANE:: An efficient algorithm for discovering functional and approximate dependencies
    Huhtala, Y
    Kärkkäinen, J
    Porkka, P
    Toivonen, H
    COMPUTER JOURNAL, 1999, 42 (02): : 100 - 111
  • [35] FUN: An efficient algorithm for mining functional and embedded dependencies
    Novelli, N
    Cicchetti, R
    DATABASE THEORY - ICDT 2001, PROCEEDINGS, 2001, 1973 : 189 - 203
  • [36] An Efficient Algorithm for Reasoning about Fuzzy Functional Dependencies
    Cordero, P.
    Enciso, M.
    Mora, A.
    de Guzman, I. Perez
    Rodriguez-Jimenez, J. M.
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2011, PT II, 2011, 6692 : 412 - 420
  • [37] Elaboration on functional dependencies: Functional dependencies are dead, long live functional dependencies!
    Karachalias G.
    Schrijvers T.
    ACM SIGPLAN Not., 10 (133-147): : 133 - 147
  • [38] Elaboration on Functional Dependencies: Functional Dependencies Are Dead, Long Live Functional Dependencies!
    Karachalias, Georgios
    Schrijvers, Tom
    ACM SIGPLAN NOTICES, 2017, 52 (10) : 133 - 147
  • [39] Towards Efficient Heap Overflow Discovery
    Jia, Xiangkun
    Zhang, Chao
    Su, Purui
    Yang, Yi
    Huang, Huafeng
    Feng, Dengguo
    PROCEEDINGS OF THE 26TH USENIX SECURITY SYMPOSIUM (USENIX SECURITY '17), 2017, : 989 - 1006
  • [40] A subgraph-based approach towards functional dependencies for XML
    Hartmann, S
    Link, S
    Kirchberg, M
    7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: COMPUTER SCIENCE AND ENGINEERING: II, 2003, : 200 - 205