Container-based bioinformatics with Pachyderm

被引:27
|
作者
Novella, Jon Ander [1 ,2 ]
Emami Khoonsari, Payam [3 ]
Herman, Stephanie [1 ,2 ,3 ]
Whitenack, Daniel [4 ]
Capuccini, Marco [1 ,2 ,5 ]
Burman, Joachim [6 ]
Kultima, Kim [3 ]
Spjuth, Ola [1 ,2 ]
机构
[1] Uppsala Univ, Dept Pharmaceut Biosci, S-75214 Uppsala, Sweden
[2] Uppsala Univ, Sci Life Lab, S-75214 Uppsala, Sweden
[3] Uppsala Univ, Dept Med Sci, Clin Chem, S-75185 Uppsala, Sweden
[4] Pachyderm Inc, San Francisco, CA 94107 USA
[5] Uppsala Univ, Dept Informat Technol, S-75105 Uppsala, Sweden
[6] Uppsala Univ, Dept Neurosci, S-75185 Uppsala, Sweden
基金
欧盟地平线“2020”; 瑞典研究理事会;
关键词
MASS-SPECTROMETRY;
D O I
10.1093/bioinformatics/bty699
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Computational biologists face many challenges related to data size, and they need to manage complicated analyses often including multiple stages and multiple tools, all of which must be deployed to modern infrastructures. To address these challenges and maintain reproducibility of results, researchers need (i) a reliable way to run processing stages in any computational environment, (ii) a well-defined way to orchestrate those processing stages and (iii) a data management layer that tracks data as it moves through the processing pipeline. Results Pachyderm is an open-source workflow system and data management framework that fulfils these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem, having Kubernetes as the backbone for container orchestration. We adapted Pachyderm and demonstrated its attractive properties in bioinformatics. A Helm Chart was created so that researchers can use Pachyderm in multiple scenarios. The Pachyderm File System was extended to support block storage. A wrapper for initiating Pachyderm on cloud-agnostic virtual infrastructures was created. The benefits of Pachyderm are illustrated via a large metabolomics workflow, demonstrating that Pachyderm enables efficient and sustainable data science workflows while maintaining reproducibility and scalability.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [21] Evaluation on UiTiOt Container-Based Emulation Testbed
    Chuong Dang-Le-Bao
    Nhan Ly-Trong
    Quan Le-Trung
    INDUSTRIAL NETWORKS AND INTELLIGENT SYSTEMS, INISCOM 2017, 2018, 221 : 57 - 66
  • [22] Security in Container-based Virtualization through vTPM
    Hosseinzadeh, Shohreh
    Lauren, Samuel
    Leppanen, Ville
    2016 IEEE/ACM 9TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2016, : 214 - 219
  • [23] Container-Based Cloud Virtual Machine Benchmarking
    Varghese, Blesson
    Subba, Lawan Thamsuhang
    Thai, Long
    Barker, Adam
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2016, : 192 - 201
  • [24] A performance comparison of container-based technologies for the Cloud
    Kozhirbayev, Zhanibek
    Sinnott, Richard O.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 68 : 175 - 182
  • [25] Container-Based Service Function Chain Mapping
    Siasi, N.
    Jasim, M. A.
    Crichigno, J.
    Ghani, N.
    2019 IEEE SOUTHEASTCON, 2019,
  • [26] Prediction-Based Autoscaling for Container-Based PaaS System
    Xie, Bin
    Sun, Guanyi
    Ma, Guo
    2017 IEEE 2ND INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND INTELLIGENT SYSTEMS (I2CACIS), 2017, : 19 - 24
  • [27] Performance comparison of Docker and Podman container-based virtualization
    Dordevic, Borislav
    Timcenko, Valentina
    Lazic, Milovan
    Davidovic, Nikola
    2022 21ST INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2022,
  • [28] A Review on Container-Based Lightweight Virtualization for Fog Computing
    Raghavendra, M. Sri
    Chawla, Priyanka
    2018 7TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO) (ICRITO), 2018, : 378 - 384
  • [29] On Edge Microclouds To Provide Local Container-based Services
    Baig, Roger
    Pueyo Centelles, Roger
    Freitag, Felix
    Navarro, Leandro
    2017 GLOBAL INFORMATION INFRASTRUCTURE AND NETWORKING SYMPOSIUM (GIIS), 2017, : 31 - 36
  • [30] Research on Trust Model in Container-Based Cloud Service
    Xie, Xiaolan
    Yuan, Tianwei
    Zhou, Xiao
    Cheng, Xiaochun
    CMC-COMPUTERS MATERIALS & CONTINUA, 2018, 56 (02): : 273 - 283