GemSIM: general, error-model based simulator of next-generation sequencing data

被引:114
|
作者
McElroy, Kerensa E. [1 ,2 ,3 ]
Luciani, Fabio [3 ]
Thomas, Torsten [1 ,2 ]
机构
[1] UNSW, Ctr Marine Bioinnovat, Sydney, NSW 2052, Australia
[2] UNSW, Sch Biotechnol & Biomol Sci, Sydney, NSW 2052, Australia
[3] Univ New S Wales, Sch Med Sci, Inflammat & Infect Res Grp, Sydney, NSW 2052, Australia
来源
BMC GENOMICS | 2012年 / 13卷
基金
英国医学研究理事会; 澳大利亚国家健康与医学研究理事会;
关键词
QUALITY; ACCURACY; FORMAT;
D O I
10.1186/1471-2164-13-74
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: GemSIM, or General Error-Model based SIMulator, is a next-generation sequencing simulator capable of generating single or paired-end reads for any sequencing technology compatible with the generic formats SAM and FASTQ (including Illumina and Roche/454). GemSIM creates and uses empirically derived, sequence-context based error models to realistically emulate individual sequencing runs and/or technologies. Empirical fragment length and quality score distributions are also used. Reads may be drawn from one or more genomes or haplotype sets, facilitating simulation of deep sequencing, metagenomic, and resequencing projects. Results: We demonstrate GemSIM's value by deriving error models from two different Illumina sequencing runs and one Roche/454 run, and comparing and contrasting the resulting error profiles of each run. Overall error rates varied dramatically, both between individual Illumina runs, between the first and second reads in each pair, and between datasets from Illumina and Roche/454 technologies. Indels were markedly more frequent in Roche/454 than Illumina and both technologies suffered from an increase in error rates near the end of each read. The effects of these different profiles on low-frequency SNP-calling accuracy were investigated by analysing simulated sequencing data for a mixture of bacterial haplotypes. In general, SNP-calling using VarScan was only accurate for SNPs with frequency > 3%, independent of which error model was used to simulate the data. Variation between error profiles interacted strongly with VarScan's 'minumum average quality' parameter, resulting in different optimal settings for different sequencing runs. Conclusions: Next-generation sequencing has unprecedented potential for assessing genetic diversity, however analysis is complicated as error profiles can vary noticeably even between different runs of the same technology. Simulation with GemSIM can help overcome this problem, by providing insights into the error profiles of individual sequencing runs and allowing researchers to assess the effects of these errors on downstream data analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Pathway analysis with next-generation sequencing data
    Zhao, Jinying
    Zhu, Yun
    Boerwinkle, Eric
    Xiong, Momiao
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2015, 23 (04) : 507 - 515
  • [42] Genotyping microsatellites in next-generation sequencing data
    Dashnow, Harriet
    Tan, Susan
    Das, Debjani
    Easteal, Simon
    Oshlack, Alicia
    BMC BIOINFORMATICS, 2015, 16
  • [43] Genotyping microsatellites in next-generation sequencing data
    Harriet Dashnow
    Susan Tan
    Debjani Das
    Simon Easteal
    Alicia Oshlack
    BMC Bioinformatics, 16
  • [44] Dynamic Linear Model for the Identification of miRNAs in Next-Generation Sequencing Data
    Johnson, W. Evan
    Welker, Noah C.
    Bass, Brenda L.
    BIOMETRICS, 2011, 67 (04) : 1206 - 1214
  • [45] Probabilistic model based error correction in a set of various mutant sequences analyzed by next-generation sequencing
    Aita, Takuyo
    Ichihashi, Norikazu
    Yomo, Tetsuya
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2013, 47 : 221 - 230
  • [46] Factorial Analysis of Error Correction Performance Using Simulated Next-Generation Sequencing Data
    Akogwu, Isaac
    Wang, Nan
    Zhang, Chaoyang
    Hong, Huixiao
    Choi, Hwanseok
    Gong, Ping
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1164 - 1169
  • [47] Next-generation sequencing for next-generation breeding, and more
    Tsai, Chung-Jui
    NEW PHYTOLOGIST, 2013, 198 (03) : 635 - 637
  • [48] Next-Generation Sequencing: Next-Generation Quality in Pediatrics
    Wortmann, Saskia B.
    Spenger, Johannes
    Preisel, Martin
    Koch, Johannes
    Rauscher, Christian
    Bader, Ingrid
    Mayr, Johannes A.
    Sperl, Wolfgang
    PADIATRIE UND PADOLOGIE, 2018, 53 (06): : 278 - 283
  • [49] Next-Generation Sequencing Demands Next-Generation Phenotyping
    Hennekam, Raoul C. M.
    Biesecker, Leslie G.
    HUMAN MUTATION, 2012, 33 (05) : 884 - 886
  • [50] Next-generation sequencing
    Haferlach, T.
    ONCOLOGY RESEARCH AND TREATMENT, 2016, 39 : 40 - 41