Fine-Grained Provenance for Matching & ETL

被引:13
|
作者
Zheng, Nan [1 ]
Alawini, Abdussalam [2 ]
Ives, Zachary G. [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Univ Illinois, Urbana, IL 61801 USA
关键词
WORKFLOW; MANAGEMENT;
D O I
10.1109/ICDE.2019.00025
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data provenance tools capture the steps used to produce analyses. However, scientists must choose among workflow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; provenance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks for data types such as strings, images, etc. Scientists need new capabilities to identify the sources of errors, find why different code versions produce different results, and identify which parameter values affect output. We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content within data objects. PROVision extends database-style provenance techniques to capture equivalences, support optimizations, and enable selective evaluation. We formalize our extensions, implement them in the PROVision system, and validate their effectiveness and scalability for common ETL and matching tasks.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 50 条
  • [1] A Distributed System for The Management of Fine-grained Provenance
    Sultana, Salmin
    Bertino, Elisa
    JOURNAL OF DATABASE MANAGEMENT, 2015, 26 (02) : 32 - 47
  • [2] GeneaLog: Fine-Grained Data Streaming Provenance at the Edge
    Palyvos-Giannas, Dimitris
    Gulisano, Vincenzo
    Papatriantafilou, Marina
    MIDDLEWARE'18: PROCEEDINGS OF THE 2018 ACM/IFIP/USENIX MIDDLEWARE CONFERENCE, 2018, : 227 - 238
  • [3] Fine-grained sticky provenance architecture for office documents
    Mishina, Takuya
    Yoshihama, Sachiko
    Kudo, Michiharu
    ADVANCES IN INFORMATION AND COMPUTER SECURITY, PROCEEDINGS, 2007, 4752 : 336 - +
  • [4] A MATCHING APPROACH TO UTILIZING FINE-GRAINED PARALLELISM
    GUPTA, R
    SOFFA, ML
    PROCEEDINGS OF THE TWENTY-FIRST, ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOLS 1-4: ARCHITECTURE TRACK, SOFTWARE TRACK, DECISION SUPPORT AND KNOWLEDGE BASED SYSTEMS TRACK, APPLICATIONS TRACK, 1988, : 148 - 156
  • [5] A Fine-Grained Distribution Approach for ETL Processes in Big Data Environments
    Bala, Mahfoud
    Boussaid, Omar
    Alimazighi, Zaia
    DATA & KNOWLEDGE ENGINEERING, 2017, 111 : 114 - 136
  • [6] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
  • [7] Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
    Ruan, Pingcheng
    Chen, Gang
    Tien Tuan Anh Dinh
    Lin, Qian
    Ooi, Beng Chin
    Zhang, Meihui
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (09): : 975 - 988
  • [8] Compact, Tamper-Resistant Archival of Fine-Grained Provenance
    Zheng, Nan
    Ives, Zachary G.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (04): : 485 - 497
  • [9] Fine-grained Interest Matching for Neural News Recommendation
    Wang, Heyuan
    Wu, Fangzhao
    Liu, Zheng
    Xie, Xing
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 836 - 845
  • [10] Hierarchical Part Matching for Fine-Grained Visual Categorization
    Xie, Lingxi
    Tian, Qi
    Hong, Richang
    Yan, Shuicheng
    Zhang, Bo
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 1641 - 1648