PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems

被引:1
|
作者
Han, Runzhou [1 ]
Zheng, Mai [1 ]
Byna, Suren [2 ]
Tang, Houjun [3 ]
Dong, Bin [3 ]
Dai, Dong [4 ]
Chen, Yong [5 ]
Kim, Dongkyun [5 ]
Hassoun, Joseph [5 ]
Thorsley, David [5 ]
机构
[1] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50014 USA
[2] Ohio State Univ, Columbus, OH 43210 USA
[3] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[4] Univ North Carolina Charlotte, Charlotte, NC 28223 USA
[5] Samsung Res Labs, Mountain View, CA 94043 USA
关键词
Data provenance; HPC I/O libraries; high performance computing (HPC); scientific data management; workflows; DETECTING DATA RACES;
D O I
10.1109/TPDS.2024.3374555
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four representative scientific workflows in collaboration with the domain scientists to identify concrete provenance needs. Based on the first-hand analysis, we propose a provenance framework called PROV-IO+, which includes an I/O-centric provenance model for describing scientific data and the associated I/O operations and environments precisely. Moreover, we build a prototype of PROV-IO+ to enable end-to-end provenance support on real HPC systems with little manual effort. The PROV-IO+ framework can support both containerized and non-containerized workflows on different HPC platforms with flexibility in selecting various classes of provenance. Our experiments with realistic workflows show that PROV-IO+ can address the provenance needs of the domain scientists effectively with reasonable performance (e.g., less than 3.5% tracking overhead for most experiments). Moreover, PROV-IO+ outperforms a state-of-the-art system (i.e., ProvLake) in our experiments.
引用
收藏
页码:844 / 861
页数:18
相关论文
共 50 条
  • [11] A Cross-platform Metaverse Data Management System
    Chen, Bohan
    Song, Chengxin
    Lin, Boyu
    Xu, Xin
    Tang, Ruoyan
    Lin, Yunxuan
    Yao, Yuan
    Timoney, Joseph
    Bi, Ting
    2022 IEEE INTERNATIONAL CONFERENCE ON METROLOGY FOR EXTENDED REALITY, ARTIFICIAL INTELLIGENCE AND NEURAL ENGINEERING (METROXRAINE), 2022, : 145 - 150
  • [12] contactJS']JS - A cross-platform context detection framework
    Moebert, Tobias
    Lemcke, Stefanie
    Lucke, Ulrike
    15TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES (ICALT 2015), 2015, : 108 - 110
  • [13] Recommendation System for Cross-Platform Mobile Development Framework
    dos Santos, Denisson Santana
    Nunes, Hugo Doria
    Macedo, Hendrik Teixeira
    Neto, Alberto Costa
    PROCEEDINGS OF THE XV BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS, SBSI 2019: Complexity on Modern Information Systems, 2019,
  • [14] GeCSen - A Generic and Cross-Platform Sensor Framework for LocON
    De Coster, Mitch
    Mattheussen, Steven
    Klepal, Martin
    Weyn, Maarten
    Ergeerts, Glenn
    UBICOMM 2010: THE FOURTH INTERNATIONAL CONFERENCE ON MOBILE UBIQUITOUS COMPUTING, SYSTEMS, SERVICES AND TECHNOLOGIES, 2010, : 21 - 26
  • [15] HIVE: A cross-platform, modular visualization framework for large-scale data sets
    Ono K.
    Nonaka J.
    Kawanabe T.
    Fujita M.
    Oku K.
    Hatta K.
    Future Generation Computer Systems, 2020, 112 : 875 - 883
  • [16] A cross-platform software framework for medical image processing
    Van Leemput, K
    Hämäläinen, J
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2004, PT 2, PROCEEDINGS, 2004, 3217 : 1091 - 1092
  • [17] A context awareness framework for cross-platform distributed applications
    Ntanos, C.
    Botsikas, C.
    Rovis, G.
    Kakavas, P.
    Askounis, D.
    JOURNAL OF SYSTEMS AND SOFTWARE, 2014, 88 : 138 - 146
  • [18] NeuroPilot: A Cross-Platform Framework for Edge-AI
    Chen, Tung-Chien
    Wang, Wei-Ting
    Kao, Kloze
    Yu, Chia-Lin
    Lin, Code
    Chang, Shu-Hsin
    Tsung, Pei-Kuei
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 167 - 170
  • [19] CytoML for cross-platform cytometry data sharing
    Finak, Greg
    Jiang, Wenxin
    Gottardo, Raphael
    CYTOMETRY PART A, 2018, 93A (12) : 1189 - 1196
  • [20] RHEEM: Enabling Cross-Platform Data Processing
    Agrawal, Divy
    Chawla, Sanjay
    Contreras-Rojas, Bertty
    Elmagarmid, Ahmed
    Idris, Yasser
    Kaoudi, Zoi
    Kruse, Sebastian
    Lucas, Ji
    Mansour, Essam
    Ouzzani, Mourad
    Papotti, Paolo
    Quiane-Ruiz, Jorge-Arnulfo
    Tang, Nan
    Thirumuruganathan, Saravanan
    Troudi, Anis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (11): : 1414 - 1427