Variant Callers for Next-Generation Sequencing Data: A Comparison Study

被引:112
|
作者
Liu, Xiangtao [1 ,2 ]
Han, Shizhong [1 ,2 ]
Wang, Zuoheng [3 ]
Gelernter, Joel [1 ,2 ,4 ,5 ]
Yang, Bao-Zhu [1 ,2 ]
机构
[1] Yale Univ, Sch Med, Dept Psychiat, Div Human Genet, New Haven, CT 06520 USA
[2] VA CT Hlth Care Ctr, West Haven, CT USA
[3] Yale Univ, Sch Publ Hlth, Dept Biostat, New Haven, CT USA
[4] Yale Univ, Sch Med, Dept Genet, New Haven, CT 06510 USA
[5] Yale Univ, Sch Med, Dept Neurobiol, New Haven, CT USA
来源
PLOS ONE | 2013年 / 8卷 / 09期
基金
美国国家卫生研究院;
关键词
MAPREDUCE; FRAMEWORK; GENOTYPE; FORMAT;
D O I
10.1371/journal.pone.0075619
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the calling strategies implemented. We studied the performance of four prevailing callers, SAMtools, GATK, glftools and Atlas2, using single-sample and multiple-sample variant-calling strategies. Using the same aligner, BWA, we built four single-sample and three multiple-sample calling pipelines and applied the pipelines to whole exome sequencing data taken from 20 individuals. We obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis and then used Sanger sequencing as a "gold-standard" method to resolve discrepancies for selected regions of high discordance. Finally, we compared the sensitivity of three of the single-sample calling pipelines using known simulated whole genome sequence data as a gold standard. Overall, for single-sample calling, the called variants were highly consistent across callers and the pairwise overlapping rate was about 0.9. Compared with other callers, GATK had the highest rediscovery rate (0.9969) and specificity (0.99996), and the Ti/Tv ratio out of GATK was closest to the expected value of 3.02. Multiple-sample calling increased the sensitivity. Results from the simulated data suggested that GATK outperformed SAMtools and glfSingle in sensitivity, especially for low coverage data. Further, for the selected discrepant regions evaluated by Sanger sequencing, variant genotypes called by exome sequencing versus the exome array were more accurate, although the average variant sensitivity and overall genotype consistency rate were as high as 95.87% and 99.82%, respectively. In conclusion, GATK showed several advantages over other variant callers for general purpose NGS analyses. The GATK pipelines we developed perform very well.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Genotyping microsatellites in next-generation sequencing data
    Dashnow, Harriet
    Tan, Susan
    Das, Debjani
    Easteal, Simon
    Oshlack, Alicia
    BMC BIOINFORMATICS, 2015, 16
  • [42] Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
    Sarah Sandmann
    Aniek O. de Graaf
    Mohsen Karimi
    Bert A. van der Reijden
    Eva Hellström-Lindberg
    Joop H. Jansen
    Martin Dugas
    Scientific Reports, 7
  • [43] Empirical Bayes single nucleotide variant-calling for next-generation sequencing data
    Ali Karimnezhad
    Theodore J. Perkins
    Scientific Reports, 14
  • [44] Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
    Kosugi, Shunichi
    Natsume, Satoshi
    Yoshida, Kentaro
    MacLean, Daniel
    Cano, Liliana
    Kamoun, Sophien
    Terauchi, Ryohei
    PLOS ONE, 2013, 8 (10):
  • [45] MutationValidator: A computational method for variant cross-validation in next-generation sequencing data
    Rosenberg, Mara
    Getz, Gad
    Kiezun, Adam
    Sivachenko, Andrey
    CANCER RESEARCH, 2014, 74 (19)
  • [46] Next-generation sequencing for next-generation breeding, and more
    Tsai, Chung-Jui
    NEW PHYTOLOGIST, 2013, 198 (03) : 635 - 637
  • [47] Next-Generation Sequencing: Next-Generation Quality in Pediatrics
    Wortmann, Saskia B.
    Spenger, Johannes
    Preisel, Martin
    Koch, Johannes
    Rauscher, Christian
    Bader, Ingrid
    Mayr, Johannes A.
    Sperl, Wolfgang
    PADIATRIE UND PADOLOGIE, 2018, 53 (06): : 278 - 283
  • [48] A multi-site comparison of copy number variant callers for germline next generation sequencing using targeted capture
    Piatek, S. G.
    Davies, A. C.
    Lombard, P.
    Ahlfors, H.
    Jenkins, L.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 1694 - 1695
  • [49] Next-Generation Sequencing Demands Next-Generation Phenotyping
    Hennekam, Raoul C. M.
    Biesecker, Leslie G.
    HUMAN MUTATION, 2012, 33 (05) : 884 - 886
  • [50] Validation and assessment of variant calling pipelines for next-generation sequencing
    Pirooznia, Mehdi
    Kramer, Melissa
    Parla, Jennifer
    Goes, Fernando S.
    Potash, James B.
    McCombie, W. Richard
    Zandi, Peter P.
    HUMAN GENOMICS, 2014, 8 : 14