Decentralized and Incremental Discovery of Relaxed Functional Dependencies Using Bitwise Similarity

被引:0
|
作者
Breve, Bernardo [1 ]
Caruccio, Loredana [1 ]
Cirillo, Stefano [1 ]
Deufemia, Vincenzo [1 ]
Polese, Giuseppe [1 ]
机构
[1] Univ Salerno, Dept Comp Sci, I-84084 Fisciano, Italy
关键词
Heuristic algorithms; Partitioning algorithms; Metadata; Vectors; Lattices; Task analysis; Symbols; Data profiling; relaxed functional dependencies; incremental scenarios; bitwise similarities;
D O I
10.1109/TKDE.2024.3403928
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past decade, there have been numerous extensions to the definition of Functional Dependency (fd), culminating in the introduction of Relaxed Functional Dependency (rfd), offering more flexible constraints compared to traditional fds. This increased flexibility makes rfds well-suited for exploring and profiling data in datasets with lower data quality. However, efficiently identifying rfds within dynamic data sources presents a significant challenge, as it requires processing an entire dataset from scratch whenever modifications occur. To tackle this problem, incremental discovery algorithms have been defined, but they often suffer when the frequency and the size of batches of updates increase. This article presents a new algorithm, namely D-IndiBits, relying on a new decentralized architecture to balance the workload that drives the incremental discovery process of IndiBits, which is based on bitwise operators for computing attribute similarities. Experiments demonstrate D-IndiBits's effectiveness compared to fd and rfd discovery algorithms on both static and dynamic real-world data. With batches of modifications of sizes 10 k and 100 k, D-IndiBits is capable of updating the set of rfds in a few seconds, whereas all other approaches often employ more than 3 hours.
引用
收藏
页码:7380 / 7398
页数:19
相关论文
共 50 条
  • [21] Efficient Discovery of Ontology Functional Dependencies
    Baskaran, Sridevi
    Keller, Alexander
    Chiang, Fei
    Golab, Lukasz
    Szlichta, Jaroslaw
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1847 - 1856
  • [22] Discovery Algorithms for Embedded Functional Dependencies
    Wei, Ziheng
    Hartmann, Sven
    Link, Sebastian
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 833 - 843
  • [23] Towards the efficient discovery of meaningful functional dependencies
    Wei, Ziheng
    Link, Sebastian
    INFORMATION SYSTEMS, 2023, 116
  • [24] Efficient discovery of functional dependencies and Armstrong relations
    Lopes, S
    Petit, JM
    Lakhal, L
    ADVANCES IN DATABASE TECHNOLOGY-DEBT 2000, PROCEEDINGS, 2000, 1777 : 350 - 364
  • [25] Effective Pruning for the Discovery of Conditional Functional Dependencies
    Li, Jiuyong
    Liu, Jixue
    Toivonen, Hannu
    Yong, Jianming
    COMPUTER JOURNAL, 2013, 56 (03): : 378 - 392
  • [26] Efficient discovery of functional dependencies with degrees of satisfaction
    Wei, Q
    Chen, GQ
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2004, 19 (11) : 1089 - 1110
  • [27] Efficient Discovery of Functional Dependencies on Massive Data
    Wan, Xiaolong
    Han, Xixian
    Wang, Jinbao
    Li, Jianzhong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (01) : 107 - 121
  • [28] Database mining for the discovery of extended functional dependencies
    Bosc, P
    Pivert, O
    Ughetto, L
    18TH INTERNATIONAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1999, : 580 - 584
  • [29] Approximate Discovery of Functional Dependencies for Large Datasets
    Bleifuss, Tobias
    Buelow, Susanne
    Frohnhofen, Johannes
    Risch, Julian
    Wiese, Georg
    Kruse, Sebastian
    Papenbrock, Thorsten
    Naumann, Felix
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1803 - 1812
  • [30] Discovering Relaxed Functional Dependencies Based on Multi-Attribute Dominance
    Caruccio, Loredana
    Deufemia, Vincenzo
    Naumann, Felix
    Polese, Giuseppe
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (09) : 3212 - 3228