Decentralized and Incremental Discovery of Relaxed Functional Dependencies Using Bitwise Similarity

被引:0
|
作者
Breve, Bernardo [1 ]
Caruccio, Loredana [1 ]
Cirillo, Stefano [1 ]
Deufemia, Vincenzo [1 ]
Polese, Giuseppe [1 ]
机构
[1] Univ Salerno, Dept Comp Sci, I-84084 Fisciano, Italy
关键词
Heuristic algorithms; Partitioning algorithms; Metadata; Vectors; Lattices; Task analysis; Symbols; Data profiling; relaxed functional dependencies; incremental scenarios; bitwise similarities;
D O I
10.1109/TKDE.2024.3403928
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past decade, there have been numerous extensions to the definition of Functional Dependency (fd), culminating in the introduction of Relaxed Functional Dependency (rfd), offering more flexible constraints compared to traditional fds. This increased flexibility makes rfds well-suited for exploring and profiling data in datasets with lower data quality. However, efficiently identifying rfds within dynamic data sources presents a significant challenge, as it requires processing an entire dataset from scratch whenever modifications occur. To tackle this problem, incremental discovery algorithms have been defined, but they often suffer when the frequency and the size of batches of updates increase. This article presents a new algorithm, namely D-IndiBits, relying on a new decentralized architecture to balance the workload that drives the incremental discovery process of IndiBits, which is based on bitwise operators for computing attribute similarities. Experiments demonstrate D-IndiBits's effectiveness compared to fd and rfd discovery algorithms on both static and dynamic real-world data. With batches of modifications of sizes 10 k and 100 k, D-IndiBits is capable of updating the set of rfds in a few seconds, whereas all other approaches often employ more than 3 hours.
引用
收藏
页码:7380 / 7398
页数:19
相关论文
共 50 条
  • [31] Discovering Relaxed Functional Dependencies based on Multi-attribute Dominance
    Caruccio, Loredana
    Deufemia, Vincenzo
    Naumann, Felix
    Polese, Giuseppe
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2354 - 2355
  • [32] AD-Miner: A new incremental method for discovery of minimal approximate dependencies using logical operations
    Fakhrahmad, S. M.
    Sadreddini, M. H.
    Jahromi, M. Zolghadri
    INTELLIGENT DATA ANALYSIS, 2008, 12 (06) : 607 - 619
  • [33] Relaxed Functional Dependency Discovery in Heterogeneous Data Lakes
    Hai, Rihan
    Quix, Christoph
    Wang, Dan
    CONCEPTUAL MODELING, ER 2019, 2019, 11788 : 225 - 239
  • [34] Discovery and Application of Functional Dependencies in Conjunctive Query Mining
    Goethals, Bart
    Laurent, Dominique
    Le Page, Wim
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, 2010, 6263 : 142 - +
  • [35] Functional genomics approaches to the discovery of paralog dependencies in cancer
    Sellers, William
    MOLECULAR CANCER THERAPEUTICS, 2019, 18 (12)
  • [36] Scalable Functional Dependencies Discovery from Big Data
    Tu Shouzhong
    Huang Minlie
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 426 - 431
  • [37] An experiment in knowledge discovery using data dependencies
    McErlean, F
    Bell, DA
    KYBERNETES, 1997, 26 (8-9) : 908 - +
  • [38] Provenance-aware Discovery of Functional Dependencies on Integrated Views
    Comignani, Ugo
    Berti-Equille, Laure
    Novelli, Noel
    Bonifati, Angela
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 621 - 633
  • [39] A Complete Logic for Fuzzy Functional Dependencies over Domains with Similarity Relations
    Cordero, P.
    Enciso, M.
    Mora, A.
    de Guzman, I. P.
    BIO-INSPIRED SYSTEMS: COMPUTATIONAL AND AMBIENT INTELLIGENCE, PT 1, 2009, 5517 : 261 - 269
  • [40] Discovery of Genuine Functional Dependencies from Relational Data with Missing Values
    Berti-Equille, Laure
    Harmouch, Nazar
    Naumann, Felix
    Novelli, Noel
    Saravanan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (08): : 880 - 892