PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems

被引:1
|
作者
Han, Runzhou [1 ]
Zheng, Mai [1 ]
Byna, Suren [2 ]
Tang, Houjun [3 ]
Dong, Bin [3 ]
Dai, Dong [4 ]
Chen, Yong [5 ]
Kim, Dongkyun [5 ]
Hassoun, Joseph [5 ]
Thorsley, David [5 ]
机构
[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50014 USA
[2] Ohio State Univ, Columbus, OH 43210 USA
[3] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[4] Univ North Carolina Charlotte, Charlotte, NC 28223 USA
[5] Samsung Res Labs, Mountain View, CA 94043 USA
关键词
Data provenance; HPC I/O libraries; high performance computing (HPC); scientific data management; workflows; DETECTING DATA RACES;
D O I
10.1109/TPDS.2024.3374555
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four representative scientific workflows in collaboration with the domain scientists to identify concrete provenance needs. Based on the first-hand analysis, we propose a provenance framework called PROV-IO+, which includes an I/O-centric provenance model for describing scientific data and the associated I/O operations and environments precisely. Moreover, we build a prototype of PROV-IO+ to enable end-to-end provenance support on real HPC systems with little manual effort. The PROV-IO+ framework can support both containerized and non-containerized workflows on different HPC platforms with flexibility in selecting various classes of provenance. Our experiments with realistic workflows show that PROV-IO+ can address the provenance needs of the domain scientists effectively with reasonable performance (e.g., less than 3.5% tracking overhead for most experiments). Moreover, PROV-IO+ outperforms a state-of-the-art system (i.e., ProvLake) in our experiments.
引用
收藏
页码:844 / 861
页数:18
相关论文
共 50 条
  • [31] A Cross-Platform SpMV Framework on Many-Core Architectures
    Zhang, Yunquan
    Li, Shigang
    Yan, Shengen
    Zhou, Huiyang
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (04)
  • [32] Youkai: A Cross-Platform Framework for Testing VR/AR Apps
    Figueira, Thiago
    Gil, Adriano
    HCI INTERNATIONAL 2022 - LATE BREAKING PAPERS: INTERACTING WITH EXTENDED REALITY AND ARTIFICIAL INTELLIGENCE, 2022, 13518 : 3 - 12
  • [33] Empowering open systems through cross-platform interoperability
    Lyke, James C.
    OPEN ARCHITECTURE/OPEN BUSINESS MODEL NET-CENTRIC SYSTEMS AND DEFENSE TRANSFORMATION 2014, 2014, 9096
  • [34] Psynteract: A flexible, cross-platform, open framework for interactive experiments
    Felix Henninger
    Pascal J. Kieslich
    Benjamin E. Hilbig
    Behavior Research Methods, 2017, 49 : 1605 - 1614
  • [35] A computational framework to improve cross-platform implementation of transcriptomics signatures
    Kreitmann, Louis
    D'Souza, Giselle
    Miglietta, Luca
    Vito, Ortensia
    Jackson, Heather R.
    Habgood-Coote, Dominic
    Levin, Michael
    Holmes, Alison
    Kaforou, Myrsini
    Rodriguez-Manzano, Jesus
    EBIOMEDICINE, 2024, 105
  • [36] A cross-platform modular framework for building Life Cycle Assessment
    Kiss, B.
    Roeck, M.
    Passer, A.
    Szalay, Z.
    SUSTAINABLE BUILT ENVIRONMENT D-A-CH CONFERENCE 2019 (SBE19 GRAZ), 2019, 323
  • [37] A framework for efficient and rapid development of cross-platform audio applications
    Xavier Amatriain
    Pau Arumi
    David Garcia
    Multimedia Systems, 2008, 14 : 15 - 32
  • [38] Cross-Platform Data Processing: Use Cases and Challenges
    Kaoudi, Zoi
    Quiane-Ruiz, Jorge-Arnulfo
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1723 - 1726
  • [39] methyLiftover: cross-platform DNA methylation data integration
    Titus, Alexander J.
    Houseman, E. Andres
    Johnson, Kevin C.
    Christensen, Brock C.
    BIOINFORMATICS, 2016, 32 (16) : 2517 - 2519
  • [40] Loka: A Cross-Platform Virtual Reality Streaming Framework for the Metaverse
    Kao, Hsiao-Wen
    Chen, Yan-Cyuan
    Wu, Eric Hsiao-Kuang
    Yeh, Shih-Ching
    Kao, Shih-Chun
    SENSORS, 2025, 25 (04)