An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data

被引:48
|
作者
Zhang, Jin [1 ]
Wang, Jiayin [1 ]
Wu, Yufeng [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
美国国家科学基金会;
关键词
NUCLEOTIDE-RESOLUTION; COPY NUMBER; VARIANTS; BREAKPOINTS; DELETIONS; ALIGNMENT;
D O I
10.1186/1471-2105-13-S6-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recent advances in sequencing technologies make it possible to comprehensively study structural variations (SVs) using sequence data of large-scale populations. Currently, more efforts have been taken to develop methods that call SVs with exact breakpoints. Among these approaches, split-read mapping methods can be applied on low-coverage sequence data. With increasing amount of data generated, more efficient split-read mapping methods are still needed. Also, since sequence errors can not be avoided for the current sequencing technologies, more accurate split-read mapping methods are still needed to better handle sequence errors. Results: In this paper, we present a split-read mapping method implemented in the program SVseq2 which improves our previous work SVseq1. Similar to SVseq1, SVseq2 calls deletions (and insertions) with exact breakpoints. SVseq2 achieves more accurate calling through split-read mapping within focal regions. SVseq2 also has a much desired feature: there is no need to specify the maximum deletion size, while some existing split-read mapping methods need more memory and longer running time when larger maximum deletion size is chosen. SVseq2 is also much faster because it only needs to examine a small number of ways of splitting the reads. Moreover, SVseq2 supports insertion calling from low-coverage sequence data, while SVseq1 only supports deletion finding. The program SVseq2 can be downloaded at http://www.engr.uconn.edu/similar to jiz08001/. Conclusions: SVseq2 enables accurate and efficient SV calling through split-read mapping within focal regions using paired-end reads. For many simulated data and real sequence data, SVseq2 outperforms some other existing approaches in accuracy and efficiency, especially when sequence coverage is low.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Variant calling in low-coverage whole genome sequencing of a Native American population sample
    Chris Bizon
    Michael Spiegel
    Scott A Chasse
    Ian R Gizer
    Yun Li
    Ewa P Malc
    Piotr A Mieczkowski
    Josh K Sailsbery
    Xiaoshu Wang
    Cindy L Ehlers
    Kirk C Wilhelmsen
    BMC Genomics, 15
  • [32] Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads
    Duitama, Jorge
    Kennedy, Justin
    Dinakar, Sanjiv
    Hernandez, Yoezen
    Wu, Yufeng
    Mandoiu, Ion I.
    BMC BIOINFORMATICS, 2011, 12
  • [33] Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads
    Jorge Duitama
    Justin Kennedy
    Sanjiv Dinakar
    Yözen Hernández
    Yufeng Wu
    Ion I Măndoiu
    BMC Bioinformatics, 12
  • [34] PMAT: an efficient plant mitogenome assembly toolkit using low-coverage HiFi sequencing data
    Bi, Changwei
    Shen, Fei
    Han, Fuchuan
    Qu, Yanshu
    Hou, Jing
    Xu, Kewang
    Xu, Li-an
    He, Wenchuang
    Wu, Zhiqiang
    Yin, Tongming
    HORTICULTURE RESEARCH, 2024, 11 (03)
  • [35] Detecting inherited and novel structural variants in low-coverage parent-child sequencing data
    Spence, Melissa
    Banuelos, Mario
    Marcia, Roummel F.
    Sindi, Suzanne
    METHODS, 2020, 173 : 61 - 68
  • [36] Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm
    José Ignacio Lucas-Lledó
    David Vicente-Salvador
    Cristina Aguado
    Mario Cáceres
    BMC Bioinformatics, 15
  • [37] Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm
    Ignacio Lucas-Lledo, Jose
    Vicente-Salvador, David
    Aguado, Cristina
    Caceres, Mario
    BMC BIOINFORMATICS, 2014, 15
  • [38] AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data
    Nathan K. Schaefer
    Beth Shapiro
    Richard E. Green
    BMC Bioinformatics, 18
  • [39] Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates
    Lee, Jung Yeon
    Kim, Myeong-Kyu
    Kim, Wonkuk
    MATHEMATICS, 2020, 8 (02)
  • [40] AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data
    Schaefer, Nathan K.
    Shapiro, Beth
    Green, Richard E.
    BMC BIOINFORMATICS, 2017, 18