PGen: large-scale genomic variations analysis workflow and browser in SoyKB

被引:20
|
作者
Liu, Yang [1 ,2 ]
Khan, Saad M. [1 ,2 ]
Wang, Juexin [2 ,3 ]
Rynge, Mats [4 ]
Zhang, Yuanxun [3 ]
Zeng, Shuai [2 ,3 ]
Chen, Shiyuan [2 ,3 ]
dos Santos, Joao V. Maldonado [5 ]
Valliyodan, Babu [5 ,6 ]
Calyam, Prasad P.
Merchant, Nirav [7 ]
Nguyen, Henry T. [5 ,6 ]
Xu, Dong [1 ,2 ,3 ]
Joshi, Trupti [1 ,2 ,3 ,8 ,9 ]
机构
[1] Univ Missouri, Informat Inst, Columbia, MO 65211 USA
[2] Univ Missouri, Christopher S Bond Life Sci Ctr, Columbia, MO 65211 USA
[3] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA
[4] Univ Southern Calif, Informat Sci Inst, Los Angeles, CA USA
[5] Univ Missouri, Div Plant Sci, Columbia, MO USA
[6] Natl Ctr Soybean Biotechnol, Columbia, MO USA
[7] Univ Arizona, iPlant Collaborat, Tucson, AZ USA
[8] Univ Missouri, Sch Med, Dept Mol Microbiol & Immunol, Columbia, MO 65212 USA
[9] Univ Missouri, Sch Med, Off Res, Columbia, MO 65211 USA
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
DISCOVERY;
D O I
10.1186/s12859-016-1227-y
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed " PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. Results: We have developed both a Linux version in GitHub (https:// github. com/ pegasus-isi/ PGen-GenomicVariationsWorkflow) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), (http:// soykb. org/ Pegasus/ index. php). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser (http:// soykb. org/ NGS_ Resequence/ NGS_ index. php) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. Conclusion: PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Large-scale genomic analysis of the domestic dog informs biological discovery
    Buckley, Reuben M.
    Ostrander, Elaine A.
    GENOME RESEARCH, 2024, 34 (06) : 811 - 821
  • [32] Sequence variation genes and genomic DNA: Methods for large-scale analysis
    Mir, KU
    Southern, EM
    ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2000, 1 : 329 - +
  • [33] Large-scale genomic analysis reveals the genetic cost of chicken domestication
    Ming-Shan Wang
    Jin-Jin Zhang
    Xing Guo
    Ming Li
    Rachel Meyer
    Hidayat Ashari
    Zhu-Qing Zheng
    Sheng Wang
    Min-Sheng Peng
    Yu Jiang
    Mukesh Thakur
    Chatmongkon Suwannapoom
    Ali Esmailizadeh
    Nalini Yasoda Hirimuthugoda
    Moch Syamsul Arifin Zein
    Szilvia Kusza
    Hamed Kharrati-Koopaee
    Lin Zeng
    Yun-Mei Wang
    Ting-Ting Yin
    Min-Min Yang
    Ming-Li Li
    Xue-Mei Lu
    Emiliano Lasagna
    Simone Ceccobelli
    Humpita Gamaralalage Thilini Nisanka Gunwardana
    Thilina Madusanka Senasig
    Shao-Hong Feng
    Hao Zhang
    Abul Kashem Fazlul Haque Bhuiyan
    Muhammad Sajjad Khan
    Gamamada Liyanage Lalanie Pradeepa Silva
    Le Thi Thuy
    Okeyo A. Mwai
    Mohamed Nawaz Mohamed Ibrahim
    Guojie Zhang
    Kai-Xing Qu
    Olivier Hanotte
    Beth Shapiro
    Mirte Bosse
    Dong-Dong Wu
    Jian-Lin Han
    Ya-Ping Zhang
    BMC Biology, 19
  • [34] Cloud-Scale Genomic Signals Processing for Robust Large-Scale Cancer Genomic Microarray Data Analysis
    Harvey, Benjamin Simeon
    Ji, Soo-Yeon
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2017, 21 (01) : 238 - 245
  • [35] Large-Scale Genomic Biobanks and Cardiovascular Disease
    Aeron M. Small
    Christopher J. O’Donnell
    Scott M. Damrauer
    Current Cardiology Reports, 2018, 20
  • [36] PopGeV: a web-based large-scale population genome browser
    Shi, Xinyi
    Peng, Jing
    Yu, Xiaohan
    Zhang, Xiaohong
    Li, Dongye
    Liu, Baohui
    Kong, Fanjiang
    Yuan, Xiaohui
    BIOINFORMATICS, 2015, 31 (18) : 3048 - 3050
  • [37] ComcuteJS']JS: A WEB BROWSER BASED PLATFORM FOR LARGE-SCALE COMPUTATIONS
    Debski, Roman
    Krupa, Tomasz
    Majewski, Przemyslaw
    COMPUTER SCIENCE-AGH, 2013, 14 (01): : 143 - 152
  • [38] Large-Scale Genomic Biobanks and Cardiovascular Disease
    Small, Aeron M.
    O'Donnell, Christopher J.
    Damrauer, Scott M.
    CURRENT CARDIOLOGY REPORTS, 2018, 20 (04)
  • [39] Large-scale structure of genomic methylation patterns
    Rollins, RA
    Haghighi, F
    Edwards, JR
    Das, R
    Zhang, MQ
    Ju, JY
    Bestor, TH
    GENOME RESEARCH, 2006, 16 (02) : 157 - 163
  • [40] Large-scale 3D Reconstruction with an R-based Analysis Workflow
    Chen, Riqing
    Zhang, Hui
    BDCAT'17: PROCEEDINGS OF THE FOURTH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, 2017, : 85 - 93