Challenges of Large-scale Biomedical Workflows on the Cloud - A Case Study on the Need for Reproducibility of Results

被引:5
|
作者
Kanwal, Sehrish [1 ]
Lonie, Andrew [1 ]
Sinnott, Richard O. [1 ]
Anderson, Charlotte [1 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic 3010, Australia
关键词
bioinformatics workflows; distributed compute resources; exome; NeCTAR Research Cloud; reproducibility; SQUAMOUS-CELL CARCINOMA; QUALITY; HEAD; TOOL;
D O I
10.1109/CBMS.2015.28
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Computational bioinformatics workflows are extensively used to analyse genomics data. With the unprecedented advancements in genomic sequence technology and opportunities for personalized medicines, it is essential that analysis results are repeatable by others, especially when moving into clinical environment. To cope with the complex computational demands of huge biological datasets, a shift to distributed compute resources is unavoidable. A case study was conducted in which three well-established bioinformatics analysis groups across Australia were assigned to analyse exome sequence data from a range of patients with a rare condition: disorder of sex development. Initially these groups used their own in-house data processing pipelines, and subsequently used a common bioinformatics workbench based upon Galaxy and offered through the Australia-wide National eResearch Collaboration Tools and Resources (NeCTAR) Research Cloud. This paper describes the experiences in this work and the variability of results. We put forward principles that should be used to ensure reproducibility of scientific results moving forward.
引用
收藏
页码:220 / 225
页数:6
相关论文
共 50 条
  • [1] A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows
    Lee, Claire Songhyun
    Hewes, V.
    Cerati, Giuseppe
    Kowalkowski, Jim
    Aurisano, Adam
    Agrawal, Ankit
    Choudhary, Alok
    Liao, Wei-keng
    2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 71 - 81
  • [2] Enabling Large-Scale Biomedical Analysis in the Cloud
    Lin, Ying-Chih
    Yu, Chin-Sheng
    Lin, Yen-Jen
    BIOMED RESEARCH INTERNATIONAL, 2013, 2013
  • [3] SLA enactment for large-scale healthcare workflows on multi-Cloud
    Jrad, Foued
    Tao, Jie
    Brandic, Ivona
    Streit, Achim
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 43-44 : 135 - 148
  • [4] The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows
    Hayot-Sasson, Valerie
    Glatard, Tristan
    Rokem, Ariel
    PROCEEDINGS OF 16TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS21), 2021, : 42 - 49
  • [5] Challenges in the Setup of Large-scale Next-Generation Sequencing Analysis Workflows
    Kulkarni, Pranav
    Frommolt, Peter
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2017, 15 : 471 - 477
  • [6] Teaching for Large-Scale Reproducibility Verification
    Vilhuber, Lars
    Son, Hyuk Harry
    Welch, Meredith
    Wasser, David N.
    Darisse, Michael
    JOURNAL OF STATISTICS AND DATA SCIENCE EDUCATION, 2022, 30 (03): : 274 - 281
  • [7] Building Biomedical Pipelines for Large-scale Sequencing Analysis Based on Galaxy and Cloud
    Liu, Bo
    Li, Jianqiang
    Liu, Chunchen
    6TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON 2013), 2013,
  • [8] Challenges to large-scale digital organization: the case of Uber
    Jordan J.M.
    Journal of Organization Design, 6 (1)
  • [9] Remote MicroGrid Control Center Case Study: Large-Scale Integration Challenges
    Al-Emam, M. A.
    El-Sayed, M. E.
    Resch, J. G.
    2017 SAUDI ARABIA SMART GRID CONFERENCE (SASG), 2017,
  • [10] Architecture Challenges for Internal Software Ecosystems: A Large-Scale Industry Case Study
    Schultis, Klaus-Benedikt
    Elsner, Christoph
    Lohmann, Daniel
    22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 542 - 552