Fine-Grained Provenance for Matching & ETL

被引:13
|
作者
Zheng, Nan [1 ]
Alawini, Abdussalam [2 ]
Ives, Zachary G. [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Univ Illinois, Urbana, IL 61801 USA
关键词
WORKFLOW; MANAGEMENT;
D O I
10.1109/ICDE.2019.00025
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data provenance tools capture the steps used to produce analyses. However, scientists must choose among workflow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; provenance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks for data types such as strings, images, etc. Scientists need new capabilities to identify the sources of errors, find why different code versions produce different results, and identify which parameter values affect output. We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content within data objects. PROVision extends database-style provenance techniques to capture equivalences, support optimizations, and enable selective evaluation. We formalize our extensions, implement them in the PROVision system, and validate their effectiveness and scalability for common ETL and matching tasks.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 50 条
  • [31] Fine-Grained Cryptography
    Degwekar, Akshay
    Vaikuntanathan, Vinod
    Vasudevan, Prashant Nalini
    ADVANCES IN CRYPTOLOGY (CRYPTO 2016), PT III, 2016, 9816 : 533 - 562
  • [32] Fine-grained parallel regular expression matching for deep packet inspection
    Liu, X. (xingkuiliu@ncic.ac.cn), 1600, Science Press (51):
  • [33] (Fractional) Online Stochastic Matching via Fine-Grained Offline Statistics
    Tang, Zhihao Gavin
    Wu, Jinzhao
    Wu, Hongxun
    PROCEEDINGS OF THE 54TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '22), 2022, : 77 - 90
  • [34] Fine-grained Similarity Matching with a Similarity Filtration Pyramid for Code Search
    Tan, Cong
    Yang, Shun
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [35] Fine-Grained User Profiling for Personalized Task Matching in Mobile Crowdsensing
    Wu, Fan
    Yang, Shuo
    Zheng, Zhenzhe
    Tang, Shaojie
    Chen, Guihai
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2021, 20 (10) : 2961 - 2976
  • [36] Multi-Scale Fine-Grained Alignments for Image and Sentence Matching
    Li, Wenhui
    Wang, Yan
    Su, Yuting
    Li, Xuanya
    Liu, An-An
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 543 - 556
  • [37] ROOM: Rule Organized Optimal Matching for Fine-Grained Traffic Identification
    Li, Hao
    Hu, Chengchen
    2013 PROCEEDINGS IEEE INFOCOM, 2013, : 65 - 69
  • [38] A web service matching method based on fine-grained data semantics
    Li, Yanping
    Journal of Chemical and Pharmaceutical Research, 2014, 6 (06) : 2570 - 2576
  • [39] EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
    Shi, Yaya
    Yang, Xu
    Xu, Haiyang
    Yuan, Chunfeng
    Li, Bing
    Hu, Weiming
    Zha, Zheng-Jun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17908 - 17917
  • [40] Provenance of Fine-grained Sediments in the Inner Shelf of the Korea Strait (South Sea), Korea
    Um, In kwon
    Choi, Man Sik
    Bae, Sung Ho
    Song, Yunho
    Kong, Gee Soo
    OCEAN SCIENCE JOURNAL, 2018, 53 (01) : 31 - 42