Decentralized and Incremental Discovery of Relaxed Functional Dependencies Using Bitwise Similarity

被引:0
|
作者
Breve, Bernardo [1 ]
Caruccio, Loredana [1 ]
Cirillo, Stefano [1 ]
Deufemia, Vincenzo [1 ]
Polese, Giuseppe [1 ]
机构
[1] Univ Salerno, Dept Comp Sci, I-84084 Fisciano, Italy
关键词
Heuristic algorithms; Partitioning algorithms; Metadata; Vectors; Lattices; Task analysis; Symbols; Data profiling; relaxed functional dependencies; incremental scenarios; bitwise similarities;
D O I
10.1109/TKDE.2024.3403928
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past decade, there have been numerous extensions to the definition of Functional Dependency (fd), culminating in the introduction of Relaxed Functional Dependency (rfd), offering more flexible constraints compared to traditional fds. This increased flexibility makes rfds well-suited for exploring and profiling data in datasets with lower data quality. However, efficiently identifying rfds within dynamic data sources presents a significant challenge, as it requires processing an entire dataset from scratch whenever modifications occur. To tackle this problem, incremental discovery algorithms have been defined, but they often suffer when the frequency and the size of batches of updates increase. This article presents a new algorithm, namely D-IndiBits, relying on a new decentralized architecture to balance the workload that drives the incremental discovery process of IndiBits, which is based on bitwise operators for computing attribute similarities. Experiments demonstrate D-IndiBits's effectiveness compared to fd and rfd discovery algorithms on both static and dynamic real-world data. With batches of modifications of sizes 10 k and 100 k, D-IndiBits is capable of updating the set of rfds in a few seconds, whereas all other approaches often employ more than 3 hours.
引用
收藏
页码:7380 / 7398
页数:19
相关论文
共 50 条
  • [42] Towards a Parallel Approach for Incremental Mining of Functional Dependencies on Multi-core Systems
    Gasmi, Ghada
    Slimani, Yahya
    Lakhal, Lotfi
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT I: 15TH INTERNATIONAL CONFERENCE, KES 2011, 2011, 6881 : 590 - 598
  • [43] Incremental document clustering using cluster similarity histograms
    Hammouda, KM
    Kamel, MS
    IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 597 - 601
  • [44] Using transversals for discovering XML functional dependencies
    Trinh, Thu
    FOUNDATIONS OF INFORMATION AND KNOWLEDGE SYSTEMS, PROCEEDINGS, 2008, 4932 : 199 - 218
  • [45] Enhancement of Incremental Design for FPGAs Using Circuit Similarity
    Shi, Xiaoyu
    Zeng, Dahua
    Hu, Yu
    Lin, Guohui
    Zaiane, Osmar R.
    2011 12TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED), 2011, : 249 - 256
  • [46] Building decision trees using functional dependencies
    Lam, KW
    Lee, VCS
    ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS, 2004, : 470 - 473
  • [47] RECOGNITION OF FUNCTIONAL DEPENDENCIES USING METEOROLOGICAL DATA
    VAPNIK, VN
    ROMANOV, LN
    IZVESTIYA AKADEMII NAUK SSSR FIZIKA ATMOSFERY I OKEANA, 1978, 14 (02): : 131 - 137
  • [48] ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional Dependencies
    Qahtan, Abdulhakim
    Tang, Nan
    Ouzzani, Mourad
    Cao, Yang
    Stonebraker, Michael
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1977 - 1980
  • [49] A 3-Window Framework for the Discovery and Interpretation of Predictive Temporal Functional Dependencies
    Amico, Beatrice
    Combi, Carlo
    ARTIFICIAL INTELLIGENCE IN MEDICINE, AIME 2022, 2022, 13263 : 299 - 309
  • [50] Secure and Reliable Decentralized Truth Discovery using Blockchain
    Tian, Yifan
    Yuan, Jiawei
    Song, Houbing
    2019 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2019,