An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data

被引:48
|
作者
Zhang, Jin [1 ]
Wang, Jiayin [1 ]
Wu, Yufeng [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
美国国家科学基金会;
关键词
NUCLEOTIDE-RESOLUTION; COPY NUMBER; VARIANTS; BREAKPOINTS; DELETIONS; ALIGNMENT;
D O I
10.1186/1471-2105-13-S6-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recent advances in sequencing technologies make it possible to comprehensively study structural variations (SVs) using sequence data of large-scale populations. Currently, more efforts have been taken to develop methods that call SVs with exact breakpoints. Among these approaches, split-read mapping methods can be applied on low-coverage sequence data. With increasing amount of data generated, more efficient split-read mapping methods are still needed. Also, since sequence errors can not be avoided for the current sequencing technologies, more accurate split-read mapping methods are still needed to better handle sequence errors. Results: In this paper, we present a split-read mapping method implemented in the program SVseq2 which improves our previous work SVseq1. Similar to SVseq1, SVseq2 calls deletions (and insertions) with exact breakpoints. SVseq2 achieves more accurate calling through split-read mapping within focal regions. SVseq2 also has a much desired feature: there is no need to specify the maximum deletion size, while some existing split-read mapping methods need more memory and longer running time when larger maximum deletion size is chosen. SVseq2 is also much faster because it only needs to examine a small number of ways of splitting the reads. Moreover, SVseq2 supports insertion calling from low-coverage sequence data, while SVseq1 only supports deletion finding. The program SVseq2 can be downloaded at http://www.engr.uconn.edu/similar to jiz08001/. Conclusions: SVseq2 enables accurate and efficient SV calling through split-read mapping within focal regions using paired-end reads. For many simulated data and real sequence data, SVseq2 outperforms some other existing approaches in accuracy and efficiency, especially when sequence coverage is low.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Hybrid peeling for fast and accurate calling, phasing, and imputation with sequence data of any coverage in pedigrees
    Whalen, Andrew
    Ros-Freixedes, Roger
    Wilson, David L.
    Gorjanc, Gregor
    Hickey, John M.
    GENETICS SELECTION EVOLUTION, 2018, 50
  • [22] Fast and Accurate 1000 Genomes Imputation Using Summary Statistics or Low-coverage Sequencing Data
    Pasaniuc, Bogdan
    Zaitlen, Noah
    Bhatia, Gaurav
    Gusev, Alexander
    Patterson, Nick
    Price, Alkes L.
    GENETIC EPIDEMIOLOGY, 2012, 36 (07) : 765 - 765
  • [23] Efficient phasing and imputation of low-coverage sequencing data using large reference panels
    Rubinacci, S.
    Ribeiro, D.
    Hofmeister, R.
    Delaneau, O.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 658 - 659
  • [24] Efficient phasing and imputation of low-coverage sequencing data using large reference panels
    Rubinacci, Simone
    Ribeiro, Diogo M.
    Hofmeister, Robin J.
    Delaneau, Olivier
    NATURE GENETICS, 2021, 53 (01) : 120 - 126
  • [25] Efficient phasing and imputation of low-coverage sequencing data using large reference panels
    Simone Rubinacci
    Diogo M. Ribeiro
    Robin J. Hofmeister
    Olivier Delaneau
    Nature Genetics, 2021, 53 : 120 - 126
  • [26] Targeted analysis of polymorphic loci from low-coverage shotgun sequence data allows accurate genotyping of HLA genes in historical human populations
    Pierini, Federica
    Nutsua, Marcel
    Boehme, Lisa
    Ozer, Onur
    Bonczarowska, Joanna
    Susat, Julian
    Franke, Andre
    Nebel, Almut
    Krause-Kyora, Ben
    Lenz, Tobias L.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [27] Targeted analysis of polymorphic loci from low-coverage shotgun sequence data allows accurate genotyping of HLA genes in historical human populations
    Federica Pierini
    Marcel Nutsua
    Lisa Böhme
    Onur Özer
    Joanna Bonczarowska
    Julian Susat
    Andre Franke
    Almut Nebel
    Ben Krause-Kyora
    Tobias L. Lenz
    Scientific Reports, 10
  • [28] Accurate genotype imputation from low-coverage whole-genome sequencing data of rainbow trout
    Liu, Sixin
    Martin, Kyle E.
    Snelling, Warren M.
    Long, Roseanna
    Leeds, Timothy D.
    Vallejo, Roger L.
    Wiens, Gregory D.
    Palti, Yniv
    G3-GENES GENOMES GENETICS, 2024, 14 (09):
  • [29] Estimating optimal window size for analysis of low-coverage next-generation sequence data
    Gusnanto, Arief
    Taylor, Charles C.
    Nafisah, Ibrahim
    Wood, Henry M.
    Rabbitts, Pamela
    Berri, Stefano
    BIOINFORMATICS, 2014, 30 (13) : 1823 - 1829
  • [30] Variant calling in low-coverage whole genome sequencing of a Native American population sample
    Bizon, Chris
    Spiegel, Michael
    Chasse, Scott A.
    Gizer, Ian R.
    Li, Yun
    Malc, Ewa P.
    Mieczkowski, Piotr A.
    Sailsbery, Josh K.
    Wang, Xiaoshu
    Ehlers, Cindy L.
    Wilhelmsen, Kirk C.
    BMC GENOMICS, 2014, 15