Provenance-aware Discovery of Functional Dependencies on Integrated Views

被引:0
|
作者
Comignani, Ugo [1 ]
Berti-Equille, Laure [2 ]
Novelli, Noel [3 ]
Bonifati, Angela [4 ]
机构
[1] INRIA, Grenoble INP, Tyrex Team, Le Chesnay Rocquencourt, France
[2] IRD, ESPACE DEV, Montpellier, France
[3] Aix Marseille Univ, LIS CNRS, Marseille, France
[4] Lyon 1 Univ, Lyon, France
关键词
EFFICIENT DISCOVERY; INFERENCE; ALGORITHM;
D O I
10.1109/ICDE53745.2022.00051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The automatic discovery of functional dependencies (FDs) has been widely studied as one of the hardest problems in data profiling. Existing approaches have focused on making the FD computation efficient while inspecting single relations at a time. In this paper, for the first time we address the problem of inferring FDs for multiple relations as they occur in integrated views by solely using the functional dependencies of the base relations of the view itself. To this purpose, we leverage logical inference and selective mining and show that we can discover most of the exact FDs from the base relations and avoid the full computation of the FDs for the integrated view itself, while at the same time preserving the lineage of FDs of base relations. We propose algorithms to speedup the inferred FD discovery process and mine FDs on-the-fly only from necessary data partitions. We present InFine (INferred FunctIoNal dEpendency), an end-to-end solution to discover inferred FDs on integrated views by leveraging provenance information of base relations. Our experiments on a range of real-world and synthetic datasets demonstrate the benefits of our method over existing FD discovery methods that need to rerun the discovery process on the view from scratch and cannot exploit lineage information on the FDs. We show that InFine outperforms traditional methods necessitating the full integrated view computation by one to two order of magnitude in terms of runtime. It is also the most memory efficient method while preserving FD provenance information using mainly inference from base table with negligible execution time.
引用
收藏
页码:621 / 633
页数:13
相关论文
共 50 条
  • [1] Provenance-Aware NoSQL Databases
    Chacko, Anu Mary
    Fairooz, Munavar
    Kumar, S. D. Madhu
    SECURITY IN COMPUTING AND COMMUNICATIONS, SSCC 2016, 2016, 625 : 152 - 160
  • [2] Provenance-aware Query Optimization
    Niu, Xing
    Kapoor, Raghav
    Glavic, Boris
    Gawlick, Dieter
    Liu, Zhen Hua
    Krishnaswamy, Vasudha
    Radhakrishnan, Venkatesh
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 473 - 484
  • [3] Provenance-aware storage systems
    Muniswamy-Reddy, Kiran-Kumar
    Holland, David A.
    Braun, Uri
    Seltzer, Margo
    USENIX ASSOCIATION PROCEEDINGS OF THE 2006 USENIX ANNUAL TECHNICAL CONFERENCE, 2006, : 43 - +
  • [4] Provenance-aware secure networks
    Zhou, Wenchao
    Cronin, Eric
    Loo, Boon Thau
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 188 - 193
  • [5] A Provenance-Aware Access Control Framework with Typed Provenance
    Sun, Lianshan
    Park, Jaehong
    Dang Nguyen
    Sandhu, Ravi
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2016, 13 (04) : 411 - 423
  • [6] Decentralized provenance-aware publishing with nanopublications
    Kuhn, Tobias
    Chichester, Christine
    Krauthammer, Michael
    Queralt-Rosinach, Nuria
    Verborgh, Ruben
    Giannakopoulos, George
    Ngomo, Axel-Cyrille Ngonga
    Viglianti, Raffaele
    Dumontier, Michel
    PEERJ COMPUTER SCIENCE, 2016,
  • [7] QUAL: A Provenance-Aware Quality Model
    Baillie, Chris
    Edwards, Peter
    Pignotti, Edoardo
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2015, 5 (03): : 12
  • [8] Provenance-Aware Faceted Search in Drupal
    Shangguan, Zhenning
    Zheng, Jinguang
    McGuinness, Deborah L.
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, 2010, 6378 : 142 - 147
  • [9] Provenance-Aware Entity Resolution: Leveraging Provenance to Improve Quality
    Wang, Qing
    Schewe, Klaus-Dieter
    Wang, Woods
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT1, 2015, 9049 : 474 - 490
  • [10] Papel: Provenance-Aware Policy Definition and Execution
    Ringelstein, Christoph
    Staab, Steffen
    IEEE INTERNET COMPUTING, 2011, 15 (01) : 49 - 58