SubZero: A Fine-Grained Lineage System for Scientific Databases

被引:0
|
作者
Wu, Eugene [1 ]
Madden, Samuel [1 ]
Stonebraker, Michael [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data lineage is a key component of provenance that helps scientists track and query relationships between input and output data. While current systems readily support lineage relationships at the file or data array level, finer-grained support at an array-cell level is impractical due to the lack of support for user defined operators and the high runtime and storage overhead to store such lineage. We interviewed scientists in several domains to identify a set of common semantics that can be leveraged to efficiently store fine-grained lineage. We use the insights to define lineage representations that efficiently capture common locality properties in the lineage data, and a set of APIs so operator developers can easily export lineage information from user defined operators. Finally, we introduce two benchmarks derived from astronomy and genomics, and show that our techniques can reduce lineage query costs by up to 10x while incuring substantially less impact on workflow runtime and storage.
引用
收藏
页码:865 / 876
页数:12
相关论文
共 50 条
  • [31] Leveraging Fine-Grained Labels to Regularize Fine-Grained Visual Classification
    Wu, Junfeng
    Yao, Li
    Liu, Bin
    Ding, Zheyuan
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION (ICCMS 2019) AND 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND APPLICATIONS (ICICA 2019), 2019, : 133 - 136
  • [32] FINE-GRAINED MONOLITH
    Louw, Michael
    ARCHITECTURE SOUTH AFRICA, 2019, (96): : 48 - 49
  • [33] Is fine-grained viable?
    Aaldering, M
    EDN, 1997, 42 (02) : 28 - 28
  • [34] Fine-Grained Cryptography
    Degwekar, Akshay
    Vaikuntanathan, Vinod
    Vasudevan, Prashant Nalini
    ADVANCES IN CRYPTOLOGY (CRYPTO 2016), PT III, 2016, 9816 : 533 - 562
  • [35] Fine-grained Access Control for Time-Series Databases using NGAC
    Chiquito, Alex
    Bodin, Ulf
    Schelen, Olov
    2021 IEEE 19TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2021,
  • [36] SolutionTailor: Scientific Paper Recommendation Based on Fine-Grained Abstract Analysis
    Takahashi, Tetsuya
    Katsurai, Marie
    ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 316 - 320
  • [37] On the erodibility of fine-grained sediments in an infilling freshwater system
    Andersen, TJ
    Houwing, EJ
    Pejrup, M
    FINE SEDIMENT DYNAMICS IN THE MARINE ENVIRONMENT, 2002, 5 : 315 - 328
  • [38] Fine-grained management of software artefacts: the ADAMS system
    De Lucia, Andrea
    Fasano, Fausto
    Oliveto, Rocco
    Tortora, Genoveffa
    SOFTWARE-PRACTICE & EXPERIENCE, 2010, 40 (11): : 1007 - 1034
  • [39] Fine-Grained System Identification of Nonlinear Neural Circuits
    Bagherian, Dawna
    Gornet, James
    Bernstein, Jeremy
    Ni, Yu-Li
    Yue, Yisong
    Meister, Markus
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 14 - 24
  • [40] Retrieval system enhanced by fine-grained knowledge entities
    Jiang C.
    Wang D.
    Shen S.
    Proceedings of the Association for Information Science and Technology, 2019, 56 (01): : 677 - 678