Provenance-aware Discovery of Functional Dependencies on Integrated Views

被引:0
|
作者
Comignani, Ugo [1 ]
Berti-Equille, Laure [2 ]
Novelli, Noel [3 ]
Bonifati, Angela [4 ]
机构
[1] INRIA, Grenoble INP, Tyrex Team, Le Chesnay Rocquencourt, France
[2] IRD, ESPACE DEV, Montpellier, France
[3] Aix Marseille Univ, LIS CNRS, Marseille, France
[4] Lyon 1 Univ, Lyon, France
来源
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022) | 2022年
关键词
EFFICIENT DISCOVERY; INFERENCE; ALGORITHM;
D O I
10.1109/ICDE53745.2022.00051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The automatic discovery of functional dependencies (FDs) has been widely studied as one of the hardest problems in data profiling. Existing approaches have focused on making the FD computation efficient while inspecting single relations at a time. In this paper, for the first time we address the problem of inferring FDs for multiple relations as they occur in integrated views by solely using the functional dependencies of the base relations of the view itself. To this purpose, we leverage logical inference and selective mining and show that we can discover most of the exact FDs from the base relations and avoid the full computation of the FDs for the integrated view itself, while at the same time preserving the lineage of FDs of base relations. We propose algorithms to speedup the inferred FD discovery process and mine FDs on-the-fly only from necessary data partitions. We present InFine (INferred FunctIoNal dEpendency), an end-to-end solution to discover inferred FDs on integrated views by leveraging provenance information of base relations. Our experiments on a range of real-world and synthetic datasets demonstrate the benefits of our method over existing FD discovery methods that need to rerun the discovery process on the view from scratch and cannot exploit lineage information on the FDs. We show that InFine outperforms traditional methods necessitating the full integrated view computation by one to two order of magnitude in terms of runtime. It is also the most memory efficient method while preserving FD provenance information using mainly inference from base table with negligible execution time.
引用
收藏
页码:621 / 633
页数:13
相关论文
共 50 条
  • [21] Provenance-Aware Security Risk Analysis for Hosts and Network Flows
    Rezvani, Mohsen
    Ignjatovic, Aleksandar
    Bertino, Elisa
    Jha, Sanjay
    2014 IEEE NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (NOMS), 2014,
  • [22] RecProv: Towards Provenance-Aware User Space Record and Replay
    Ji, Yang
    Lee, Sangho
    Lee, Wenke
    Provenance and Annotation of Data and Processes, IPAW 2016, 2016, 9672 : 3 - 15
  • [23] Platform for Autonomous Sensor Characterization and Generation of Provenance-Aware Datasets
    Gtat, Yousef
    Parsnejad, Sina
    Mason, Andrew J.
    2018 IEEE 61ST INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2018, : 1004 - 1007
  • [24] Developing provenance-aware query systems: an occurrence-centric approach
    Dominguez, Eladio
    Perez, Beatriz
    Rubio, Angel Luis
    Zapata, Maria A.
    Allue, Alberto
    Lopez, Antonio
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 50 (02) : 661 - 688
  • [25] TProv: Towards a Trusted Provenance-Aware Service Based on Trusted Computing
    Luo, Wu
    Ruan, Anbang
    Shen, Qingni
    Wu, Zhonghai
    WEB SERVICES - ICWS 2018, 2018, 10966 : 67 - 83
  • [26] Developing provenance-aware query systems: an occurrence-centric approach
    Eladio Domínguez
    Beatriz Pérez
    Ángel Luis Rubio
    María A. Zapata
    Alberto Allué
    Antonio López
    Knowledge and Information Systems, 2017, 50 : 661 - 688
  • [27] A Provenance-Aware Multi-dimensional Reputation System for Online Rating Systems
    Rezvani, Mohsen
    Ignjatovic, Aleksandar
    Bertion, Elisa
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2018, 18 (04)
  • [28] A Provenance-aware policy language (cProvl) and a data traceability model (cProv) for the Cloud
    Ali, Mufajjul
    Moreau, Luc
    2013 IEEE THIRD INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING (CGC 2013), 2013, : 479 - 486
  • [29] Answering Provenance-Aware Queries on RDF Data Cubes Under Memory Budgets
    Galarraga, Luis
    Ahlstrom, Kim
    Hose, Katja
    Pedersen, Torben Bach
    SEMANTIC WEB - ISWC 2018, PT I, 2018, 11136 : 547 - 565
  • [30] A Provenance-Aware Distributed Trust Model for Resilient Unmanned Aerial Vehicle Networks
    Ge, Chunpeng
    Zhou, Lu
    Hancke, Gerhard P.
    Su, Chunhua
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (16): : 12481 - 12489