Fine-Grained Provenance for Matching & ETL

被引:13
|
作者
Zheng, Nan [1 ]
Alawini, Abdussalam [2 ]
Ives, Zachary G. [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Univ Illinois, Urbana, IL 61801 USA
关键词
WORKFLOW; MANAGEMENT;
D O I
10.1109/ICDE.2019.00025
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data provenance tools capture the steps used to produce analyses. However, scientists must choose among workflow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; provenance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks for data types such as strings, images, etc. Scientists need new capabilities to identify the sources of errors, find why different code versions produce different results, and identify which parameter values affect output. We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content within data objects. PROVision extends database-style provenance techniques to capture equivalences, support optimizations, and enable selective evaluation. We formalize our extensions, implement them in the PROVision system, and validate their effectiveness and scalability for common ETL and matching tasks.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 50 条
  • [41] Provenance of Fine-grained Sediments in the Inner Shelf of the Korea Strait (South Sea), Korea
    In kwon Um
    Man Sik Choi
    Sung Ho Bae
    Yunho Song
    Gee Soo Kong
    Ocean Science Journal, 2018, 53 : 31 - 42
  • [42] Towards Fine-Grained Recognition: Joint Learning for Object Detection and Fine-Grained Classification
    Wang, Qiaosong
    Rasmussen, Christopher
    ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT II, 2019, 11845 : 332 - 344
  • [43] How, Where, and Why Data Provenance Improves Query Debugging A Visual Demonstration of Fine-Grained Provenance Analysis for SQL
    Mueller, Tobias
    Engel, Pascal
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 3178 - 3181
  • [44] An Audiovisual Correlation Matching Method Based on Fine-Grained Emotion and Feature Fusion
    Su, Zhibin
    Feng, Yiming
    Liu, Jinyu
    Peng, Jing
    Jiang, Wei
    Liu, Jingyu
    SENSORS, 2024, 24 (17)
  • [45] Dynamic Pavement Distress Image Stitching Based on Fine-Grained Feature Matching
    Du, Yuchuan
    Weng, Zihang
    Liu, Chenglong
    Wu, Difei
    JOURNAL OF ADVANCED TRANSPORTATION, 2020, 2020
  • [46] Fine-Grained Privacy-Preserving Spatiotemporal Matching in Mobile Social Networks
    Li, Xiuguang
    Yang, Kai
    Li, Hui
    2015 International Conference on Intelligent Networking and Collaborative Systems IEEE INCoS 2015, 2015, : 374 - 378
  • [47] Fine-grained Private Matching for Proximity-based Mobile Social Networking
    Zhang, Rui
    Zhang, Yanchao
    Sun, Jinyuan
    Yan, Guanhua
    2012 PROCEEDINGS IEEE INFOCOM, 2012, : 1969 - 1977
  • [48] Fine-Grained Visual Entailment
    Thomas, Christopher
    Zhang, Yipeng
    Chang, Shih-Fu
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 398 - 416
  • [49] FINE-GRAINED SEDIMENTS - TERMINOLOGY
    STOW, DAV
    QUARTERLY JOURNAL OF ENGINEERING GEOLOGY, 1981, 14 (04): : 243 - 244
  • [50] FINE-GRAINED CHONDRULE RIMS
    WILKENING, LL
    HILL, DH
    METEORITICS, 1985, 20 (04): : 785 - 786