SubZero: A Fine-Grained Lineage System for Scientific Databases

被引:0
|
作者
Wu, Eugene [1 ]
Madden, Samuel [1 ]
Stonebraker, Michael [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data lineage is a key component of provenance that helps scientists track and query relationships between input and output data. While current systems readily support lineage relationships at the file or data array level, finer-grained support at an array-cell level is impractical due to the lack of support for user defined operators and the high runtime and storage overhead to store such lineage. We interviewed scientists in several domains to identify a set of common semantics that can be leveraged to efficiently store fine-grained lineage. We use the insights to define lineage representations that efficiently capture common locality properties in the lineage data, and a set of APIs so operator developers can easily export lineage information from user defined operators. Finally, we introduce two benchmarks derived from astronomy and genomics, and show that our techniques can reduce lineage query costs by up to 10x while incuring substantially less impact on workflow runtime and storage.
引用
收藏
页码:865 / 876
页数:12
相关论文
共 50 条
  • [1] SMOKE: Fine-grained Lineage at Interactive Speed
    Psallidas, Fotis
    Wu, Eugene
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (06): : 719 - 732
  • [2] Fine-Grained Lineage for Safer Notebook Interactions
    Macke, Stephen
    Gong, Hongpu
    Lee, Doris Jung-Lin
    Head, Andrew
    Xin, Doris
    Parameswaran, Aditya
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (06): : 1093 - 1101
  • [3] Fine-grained Access Control to Web Databases
    Roichman, Alex
    Gudes, Ehud
    SACMAT'07: PROCEEDINGS OF THE 12TH ACM SYMPOSIUM ON ACCESS CONTROL MODELS AND TECHNOLOGIES, 2007, : 31 - 40
  • [4] Diagnosing Machine Learning Pipelines with Fine-grained Lineage
    Zhang, Zhao
    Sparks, Evan R.
    Franklin, Michael J.
    HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2017, : 143 - 153
  • [5] A theory of fine-grained lineage for functions on structured objects
    Halle, Sylvain
    Tremblay, Hugo
    THEORETICAL COMPUTER SCIENCE, 2025, 1039
  • [6] A fine-grained access control model for relational databases
    Shi, Jie
    Zhu, Hong
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2010, 11 (08): : 575 - 586
  • [7] A fine-grained access control model for relational databases
    Jie SHIHong ZHU College of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhan China
    Journal of Zhejiang University-Science C(Computers & Electronics), 2010, 11 (08) : 575 - 586
  • [9] A fine-grained access control model for relational databases
    Jie Shi
    Hong Zhu
    Journal of Zhejiang University SCIENCE C, 2010, 11 : 575 - 586
  • [10] Lookup Tables: Fine-Grained Partitioning for Distributed Databases
    Tatarowicz, Aubrey L.
    Curino, Carlo
    Jones, Evan P. C.
    Madden, Sam
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 102 - 113