Decentralized and Incremental Discovery of Relaxed Functional Dependencies Using Bitwise Similarity

被引:0
|
作者
Breve, Bernardo [1 ]
Caruccio, Loredana [1 ]
Cirillo, Stefano [1 ]
Deufemia, Vincenzo [1 ]
Polese, Giuseppe [1 ]
机构
[1] Univ Salerno, Dept Comp Sci, I-84084 Fisciano, Italy
关键词
Heuristic algorithms; Partitioning algorithms; Metadata; Vectors; Lattices; Task analysis; Symbols; Data profiling; relaxed functional dependencies; incremental scenarios; bitwise similarities;
D O I
10.1109/TKDE.2024.3403928
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past decade, there have been numerous extensions to the definition of Functional Dependency (fd), culminating in the introduction of Relaxed Functional Dependency (rfd), offering more flexible constraints compared to traditional fds. This increased flexibility makes rfds well-suited for exploring and profiling data in datasets with lower data quality. However, efficiently identifying rfds within dynamic data sources presents a significant challenge, as it requires processing an entire dataset from scratch whenever modifications occur. To tackle this problem, incremental discovery algorithms have been defined, but they often suffer when the frequency and the size of batches of updates increase. This article presents a new algorithm, namely D-IndiBits, relying on a new decentralized architecture to balance the workload that drives the incremental discovery process of IndiBits, which is based on bitwise operators for computing attribute similarities. Experiments demonstrate D-IndiBits's effectiveness compared to fd and rfd discovery algorithms on both static and dynamic real-world data. With batches of modifications of sizes 10 k and 100 k, D-IndiBits is capable of updating the set of rfds in a few seconds, whereas all other approaches often employ more than 3 hours.
引用
收藏
页码:7380 / 7398
页数:19
相关论文
共 50 条
  • [1] Incremental discovery of functional dependencies using partitions
    Wang, SL
    Shen, JW
    Hong, TP
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 1322 - 1326
  • [2] Incremental Discovery of Imprecise Functional Dependencies
    Caruccio, Loredana
    Cirillo, Stefano
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2020, 12 (04):
  • [3] A Incremental Discovery of Inclusion Dependencies
    Shaabani, Nuhad
    Meinel, Christoph
    SSDBM 2017: 29TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2017,
  • [4] Fast Incremental Discovery of Pointwise Order Dependencies
    Tan, Zijing
    Ran, Ai
    Ma, Shuai
    Qin, Sheng
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (10): : 1669 - 1681
  • [5] Incremental Discovery of Order Dependencies on Tuple Insertions
    Zhu, Lin
    Sun, Xu
    Tan, Zijing
    Yang, Kejia
    Yang, Weidong
    Zhou, Xiangdong
    Tian, Yingjie
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I, 2019, 11446 : 157 - 174
  • [6] Mining relaxed functional dependencies from data
    Caruccio, Loredana
    Deufemia, Vincenzo
    Polese, Giuseppe
    DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (02) : 443 - 477
  • [7] Relaxed Functional Dependencies-A Survey of Approaches
    Caruccio, Loredana
    Deufemia, Vincenzo
    Polese, Giuseppe
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 147 - 165
  • [8] Mining relaxed functional dependencies from data
    Loredana Caruccio
    Vincenzo Deufemia
    Giuseppe Polese
    Data Mining and Knowledge Discovery, 2020, 34 : 443 - 477
  • [9] Efficient discovery of functional and approximate dependencies using partitions
    Huhtala, Y
    Karkkainen, J
    Porkka, P
    Toivonen, H
    14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, : 392 - 401
  • [10] An incremental approach for maintaining functional dependencies
    Gasmi, Ghada
    Lakhal, Lotfi
    Slimani, Yahya
    INTELLIGENT DATA ANALYSIS, 2012, 16 (03) : 365 - 381