PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems

被引:1
|
作者
Han, Runzhou [1 ]
Zheng, Mai [1 ]
Byna, Suren [2 ]
Tang, Houjun [3 ]
Dong, Bin [3 ]
Dai, Dong [4 ]
Chen, Yong [5 ]
Kim, Dongkyun [5 ]
Hassoun, Joseph [5 ]
Thorsley, David [5 ]
机构
[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50014 USA
[2] Ohio State Univ, Columbus, OH 43210 USA
[3] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[4] Univ North Carolina Charlotte, Charlotte, NC 28223 USA
[5] Samsung Res Labs, Mountain View, CA 94043 USA
关键词
Data provenance; HPC I/O libraries; high performance computing (HPC); scientific data management; workflows; DETECTING DATA RACES;
D O I
10.1109/TPDS.2024.3374555
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four representative scientific workflows in collaboration with the domain scientists to identify concrete provenance needs. Based on the first-hand analysis, we propose a provenance framework called PROV-IO+, which includes an I/O-centric provenance model for describing scientific data and the associated I/O operations and environments precisely. Moreover, we build a prototype of PROV-IO+ to enable end-to-end provenance support on real HPC systems with little manual effort. The PROV-IO+ framework can support both containerized and non-containerized workflows on different HPC platforms with flexibility in selecting various classes of provenance. Our experiments with realistic workflows show that PROV-IO+ can address the provenance needs of the domain scientists effectively with reasonable performance (e.g., less than 3.5% tracking overhead for most experiments). Moreover, PROV-IO+ outperforms a state-of-the-art system (i.e., ProvLake) in our experiments.
引用
收藏
页码:844 / 861
页数:18
相关论文
共 50 条
  • [1] PROV-IO: An I/O-Centric Provenance Framework for Scientific Data on HPC Systems
    Han, Runzhou
    Byna, Suren
    Tang, Houjun
    Dong, Bin
    Zheng, Mai
    PROCEEDINGS OF THE 31ST INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2022, 2022, : 213 - 226
  • [2] Cross-platform verification framework for embedded systems
    Wenzel, Ingomar
    Kirner, Raimund
    Rieder, Bernhard
    Puschner, Peter
    SOFTWARE TECHNOLOGIES FOR EMBEDDED AND UBIQUITOUS SYSTEMS, 2007, 4761 : 137 - 148
  • [3] The wxWindows cross-platform framework - A C++ framework for building cross-platform applications
    Zeitlin, V
    DR DOBBS JOURNAL, 2001, 26 (05): : 106 - +
  • [4] evalBox: A Cross-Platform Evaluation Framework for Network Systems
    Sinha, Vineet
    Wang, Mea
    2015 IEEE 23RD INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2015), 2015, : 15 - 18
  • [5] The wxWindows cross-platform framework
    Zeitlin, V.
    Dr. Dobb's Journal, 2001, 26 (05):
  • [6] |Lib⟩: A Cross-Platform Programming Framework for Quantum-Accelerated Scientific Computing
    Moller, Matthias
    Schalkers, Merel
    COMPUTATIONAL SCIENCE - ICCS 2020, PT VI, 2020, 12142 : 451 - 464
  • [7] General Solution Framework for Management and Monitoring of Cross-Platform Data Processes
    Banovic, V.
    Soric, I.
    2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1288 - 1292
  • [8] Cross-platform Mobile Web Development Framework with Data Interaction Approach
    Wuhan Polytechnic, 430074, China
    Glob. Conf. Inf. Technol. Commun., GCITC, 2023,
  • [9] WxWindows, an intuitive cross-platform GUI framework
    Moreno, C.
    C/C++ Users Journal, 2001, 19 (05):
  • [10] Optimizing Cross-Platform Data Movement
    Kruse, Sebastian
    Kaoudi, Zoi
    Quiane-Ruiz, Jorge-Arnulfo
    Chawla, Sanjay
    Naumann, Felix
    Contreras-Rojas, Bertty
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1642 - 1645