ONEPROVENANCE: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs

被引:0
|
作者
Psallidas, Fotis [1 ]
Agrawal, Ashvin [1 ]
Sugunan, Chandru [2 ]
Ibrahim, Khaled [1 ]
Karanasos, Konstantinos [3 ]
Camacho-Rodriguez, Jesus [1 ]
Floratou, Avrilia [1 ]
Curino, Carlo [1 ]
Ramakrishnan, Raghu [1 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Snowflake, Bozeman, MT USA
[3] Meta, Menlo Pk, CA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 16卷 / 12期
关键词
MANAGEMENT;
D O I
10.14778/3611540.3611555
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Provenance encodes information that connects datasets, their generation workflows, and associated metadata (e.g., who or when executed a query). As such, it is instrumental for a wide range of critical governance applications (e.g., observability and auditing). Unfortunately, in the context of database systems, extracting coarse-grained provenance is a long-standing problem due to the complexity and sheer volume of database workflows. Provenance extraction from query event logs has been recently proposed as favorable because, in principle, can result in meaningful provenance graphs for provenance applications. Current approaches, however, (a) add substantial overhead to the database and provenance extraction workflows and (b) extract provenance that is noisy, omits query execution dependencies, and is not rich enough for upstream applications. To address these problems, we introduce ONEPROVENANCE: an efficient provenance extraction system from query event logs. ONEPROVENANCE addresses the unique challenges of log-based extraction by (a) identifying query execution dependencies through efficient log analysis, (b) extracting provenance through novel event transformations that account for query dependencies, and (c) introducing effective filtering optimizations. Our thorough experimental analysis shows that ONEPROVENANCE can improve extraction by up to similar to 18X compared to state-of-the-art baselines; our optimizations reduce the extraction noise and optimize performance even further. ONEPROVENANCE is deployed at scale by Microsoft Purview and actively supports customer provenance extraction needs (https://bit.ly/3N2JVGF).
引用
收藏
页码:3662 / 3675
页数:14
相关论文
共 16 条
  • [1] Extracting Process Features from Event Logs to Learn Coarse-Grained Simulation Models
    Pourbafrani, Mahsa
    van der Aalst, Wil M. P.
    ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2021), 2021, 12751 : 125 - 140
  • [2] Database-less Extraction of Event Logs from Redo Logs
    Bano, Dorina
    Lichtenstein, Tom
    Klessascheck, Finn
    Weske, Mathias
    24TH INTERNATIONAL CONFERENCE ON BUSINESS INFORMATION SYSTEMS (BIS): ENTERPRISE KNOWLEDGE AND DATA SPACES, 2021, : 73 - 82
  • [3] Obtaining fully dynamic coarse-grained models from MD
    Espanol, Pep
    Zuniga, Ignacio
    PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2011, 13 (22) : 10538 - 10545
  • [4] Efficient models of cortical activity via local dynamic equilibria and coarse-grained interactions
    Xiao, Zhuo-Cheng
    Lin, Kevin K.
    Young, Lai-Sang
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (27)
  • [5] From Fine- to Coarse-Grained Dynamic Information Flow Control and Back
    Vassena, Marco
    Russo, Alejandro
    Garg, Deepak
    Rajani, Vineet
    Stefan, Deian
    FOUNDATIONS AND TRENDS IN PROGRAMMING LANGUAGES, 2023, 8 (01): : 1 - 117
  • [6] From Fine- to Coarse-Grained Dynamic Information Flow Control and Back
    Vassena, Marco
    Russo, Alejandro
    Garg, Deepak
    Rajani, Vineet
    Stefan, Deian
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL):
  • [7] AUTOMATIC EXTRACTION OF COARSE-GRAINED DATA-FLOW THREADS FROM IMPERATIVE PROGRAMS
    Li, Feng
    Pop, Antonio
    Cohen, Albert
    IEEE MICRO, 2012, 32 (04) : 19 - 31
  • [8] Extraction of luminance is more efficient from fine than from coarse grained textures
    Bindman, D
    Chubb, C
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 1999, 40 (04) : S349 - S349
  • [9] Efficient prediction of thermodynamic properties of quadrupolar fluids from simulation of a coarse-grained model:: The case of carbon dioxide
    Mognetti, B. M.
    Yelash, L.
    Virnau, P.
    Paul, W.
    Binder, K.
    Mueller, M.
    MacDowell, L. G.
    JOURNAL OF CHEMICAL PHYSICS, 2008, 128 (10):
  • [10] Martini coarse-grained models of imidazolium-based ionic liquids: from nanostructural organization to liquid-liquid extraction
    Vazquez-Salazar, Luis Itza
    Selle, Michele
    de Vries, Alex H.
    Marrink, Siewert J.
    Souza, Paulo C. T.
    GREEN CHEMISTRY, 2020, 22 (21) : 7376 - 7386