Porting the Variant Calling Pipeline for NGS data in cloud-HPC environment

被引:1
|
作者
Mulone, Alberto [1 ]
Awad, Sherine [2 ]
Chiarugi, Davide [3 ]
Aldinucci, Marco [1 ]
机构
[1] Univ Turin, Dept Comp Sci, Turin, Italy
[2] Univ Cambridge, Inst Metab Sci, Cambridge, England
[3] Max Planck Inst Human Cognit & Brain Sci, Res & Dev Grp, Comp & Databases Serv, Leipzig, Germany
来源
2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC | 2023年
基金
欧盟地平线“2020”;
关键词
StreamFlow; Hybrid workflow; High Performance Computing; cloud computing; FRAMEWORK;
D O I
10.1109/COMPSAC57700.2023.00288
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years we have understood the importance of analyzing and sequencing human genetic variation. A relevant aspect that emerged from the Covid-19 pandemic was the need to obtain results very quickly; this involved using High-Performance Computing (HPC) environments to execute the Next Generation Sequencing (NGS) pipeline. However, HPC is not always the most suitable environment for the entire execution of a pipeline, especially when it involves many heterogeneous tools. The ability to execute parts of the pipeline on different environments can lead to higher performance but also cheaper executions. This work shows the design and optimization process that led us to a state-of-the-art Variant Calling hybrid workflow based on the StreamFlow Workflow Management System (WfMS). We also compare StreamFlow with Snakemake, an established WfMS targeting HPC facilities, observing comparable performance on single environments and satisfactory improvements with a hybrid cloud-HPC configuration.
引用
收藏
页码:1858 / 1863
页数:6
相关论文
共 16 条
  • [1] Introduction to Galaxy Platform for NGS Variant Calling Pipeline
    Saif, Rashid
    Ejaz, Aniqa
    Mehmood, Tania
    Asif, Fatima
    Alghanem, Suliman Mohammad
    Ahmad, Talha Saleem
    ADVANCEMENTS IN LIFE SCIENCES, 2020, 7 (03): : 129 - 134
  • [2] ParallNormal: an efficient variant calling pipeline for unmatched sequencing data
    Follia, Laura
    Tordini, Fabio
    Pernice, Simone
    Romano, Greta
    Piaggeschi, Giulia Beatrice
    Ferrero, Giulio
    2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 423 - 429
  • [3] Comparison of Read Mapping and Variant Calling Tools for the Analysis of Plant NGS Data
    Schilbert, Hanna Marie
    Rempel, Andreas
    Pucker, Boas
    PLANTS-BASEL, 2020, 9 (04):
  • [4] A Pipeline for Variant Calling in Tumor Panels Using Amplicon Sequencing Data
    Al-okaily, Anas
    Alsmadi, Osama
    Alul, Farah
    Abdullah, Niveen
    Naser, Walid
    Tbakhi, Abdelghani
    2018 1ST INTERNATIONAL CONFERENCE ON CANCER CARE INFORMATICS (CCI), 2018, : 48 - 52
  • [5] An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data
    Wang, Yi
    Lu, James
    Yu, Jin
    Gibbs, Richard A.
    Yu, Fuli
    GENOME RESEARCH, 2013, 23 (05) : 833 - 842
  • [6] iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
    Mimori, Takahiro
    Nariai, Naoki
    Kojima, Kaname
    Takahashi, Mamoru
    Ono, Akira
    Sato, Yukuto
    Yamaguchi-Kabata, Yumi
    Nagasaki, Masao
    BMC SYSTEMS BIOLOGY, 2013, 7
  • [7] Quality Control And Assurance Strategies To Optimize Variant Calling/Detection Using Next Generation Sequencing (NGS) Data
    Ling, Hua
    Hetrick, Kurt
    Doheny, Kimberly
    Pugh, Elizabeth
    GENETIC EPIDEMIOLOGY, 2012, 36 (02) : 134 - 135
  • [8] Automated pipeline for high confidence variant calling and functional annotation, for matched tumor/normal samples sequenced by next-generation sequencing (NGS)
    Grimes, Susan M.
    Lee, HoJoon
    Greer, Stephanie
    Cheong, Jae-Ho
    Ji, Hanlee P.
    CANCER RESEARCH, 2015, 75 (22)
  • [9] Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses
    Golosova, Olga
    Henderson, Ross
    Vaskin, Yuriy
    Gabrielian, Andrei
    Grekhov, German
    Nagarajan, Vijayaraj
    Oler, Andrew J.
    Nones, Mariam Qui
    Hurt, Darrell
    Fursov, Mikhail
    Huyen, Yentram
    PEERJ, 2014, 2
  • [10] VCPA: genomic variant calling pipeline and data management tool for Alzheimer's Disease Sequencing Project
    Leung, Yuk Yee
    Valladares, Otto
    Chou, Yi-Fan
    Lin, Han-Jen
    Kuzma, Amanda B.
    Cantwell, Laura
    Qu, Liming
    Gangadharan, Prabhakaran
    Salerno, William J.
    Schellenberg, Gerard D.
    Wang, Li-San
    BIOINFORMATICS, 2019, 35 (10) : 1768 - 1770