Towards the efficient discovery of meaningful functional dependencies

被引:5
|
作者
Wei, Ziheng [1 ]
Link, Sebastian [1 ]
机构
[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand
关键词
Algorithm; Armstrong relation; Cover; Data profiling; Discovery; Functional dependency; Memory consumption; Missing data; Ranking; Redundancy; Runtime efficiency; RELATIONAL DATA; APPROXIMATE; ALGORITHM; MINIMUM; COVERS;
D O I
10.1016/j.is.2023.102224
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose the first framework for discovering the set of meaningful functional dependencies from data. This set contains the true positives among the set of functional dependencies that hold on the given data. Based on new data structures and original techniques for the dynamic computation of stripped partitions, we devise a new hybridization strategy that results in the first algorithm that can explore trade-offs between runtime efficiency and main memory usage. Using real-world benchmark data, we demonstrate that our algorithm outperforms the previous state-of-the-art in terms of runtime efficiency, and scalability in the number of rows and columns. We propose the number of redundant data values for ranking the functional dependencies that have been discovered. Our ranking helps separate false from true positives for applications, such as schema design. The remaining meaningful functional dependencies consist of the false negatives, that is, those functional dependencies that are only violated by the given data due to data inconsistency. We propose the computation of informative Armstrong relations to draw the attention of users to violations of functional dependencies that are meaningful for some application. We order the pairs of records in Armstrong relations based on the amount of inconsistency and redundancy caused by the associated functional dependencies, thereby pointing the attention to those most likely to be meaningful. As we demonstrate, these samples help separate false from true negatives, their perfect recall of meaningful functional dependencies can lead to a more complete acquisition of requirements and identification of dirty data, and may be computed faster than covers of functional dependencies. In addition, we demonstrate for the first time that non -redundant covers can offer a representation of functional dependencies that is much smaller than left-hand side reduced covers used in previous work. Such a compact representation of the output is easier to understand and explore by humans. We report all our results for different interpretations of missing values. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] Discovery of Genuine Functional Dependencies from Relational Data with Missing Values
    Berti-Equille, Laure
    Harmouch, Nazar
    Naumann, Felix
    Novelli, Noel
    Saravanan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (08): : 880 - 892
  • [42] Decentralized and Incremental Discovery of Relaxed Functional Dependencies Using Bitwise Similarity
    Breve, Bernardo
    Caruccio, Loredana
    Cirillo, Stefano
    Deufemia, Vincenzo
    Polese, Giuseppe
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 7380 - 7398
  • [43] ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional Dependencies
    Qahtan, Abdulhakim
    Tang, Nan
    Ouzzani, Mourad
    Cao, Yang
    Stonebraker, Michael
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1977 - 1980
  • [44] A 3-Window Framework for the Discovery and Interpretation of Predictive Temporal Functional Dependencies
    Amico, Beatrice
    Combi, Carlo
    ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2022, 2022, 13263 : 299 - 309
  • [45] An efficient preprocessing transformation for functional dependencies sets based on the substitution paradigm
    Mora, A
    Enciso, M
    CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE, 2004, 3040 : 136 - 146
  • [46] Differential Dependencies: Reasoning and Discovery
    Song, Shaoxu
    Chen, Lei
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2011, 36 (03):
  • [47] A Incremental Discovery of Inclusion Dependencies
    Shaabani, Nuhad
    Meinel, Christoph
    SSDBM 2017: 29TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2017,
  • [48] Smart Caching for Efficient Functional Dependency Discovery
    Birillo, Anastasia
    Bobrov, Nikita
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 1064 : 52 - 59
  • [49] Towards Combined Functional and Non-functional Semantic Service Discovery
    Kritikos, Kyriakos
    Plexousakis, Dimitris
    SERVICE-ORIENTED AND CLOUD COMPUTING, (ESOCC 2016), 2016, 9846 : 102 - 117
  • [50] INCLUSION DEPENDENCIES AND THEIR INTERACTION WITH FUNCTIONAL-DEPENDENCIES
    CASANOVA, MA
    FAGIN, R
    PAPADIMITRIOU, CH
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1984, 28 (01) : 29 - 59