An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data

被引:48
|
作者
Zhang, Jin [1 ]
Wang, Jiayin [1 ]
Wu, Yufeng [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
美国国家科学基金会;
关键词
NUCLEOTIDE-RESOLUTION; COPY NUMBER; VARIANTS; BREAKPOINTS; DELETIONS; ALIGNMENT;
D O I
10.1186/1471-2105-13-S6-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recent advances in sequencing technologies make it possible to comprehensively study structural variations (SVs) using sequence data of large-scale populations. Currently, more efforts have been taken to develop methods that call SVs with exact breakpoints. Among these approaches, split-read mapping methods can be applied on low-coverage sequence data. With increasing amount of data generated, more efficient split-read mapping methods are still needed. Also, since sequence errors can not be avoided for the current sequencing technologies, more accurate split-read mapping methods are still needed to better handle sequence errors. Results: In this paper, we present a split-read mapping method implemented in the program SVseq2 which improves our previous work SVseq1. Similar to SVseq1, SVseq2 calls deletions (and insertions) with exact breakpoints. SVseq2 achieves more accurate calling through split-read mapping within focal regions. SVseq2 also has a much desired feature: there is no need to specify the maximum deletion size, while some existing split-read mapping methods need more memory and longer running time when larger maximum deletion size is chosen. SVseq2 is also much faster because it only needs to examine a small number of ways of splitting the reads. Moreover, SVseq2 supports insertion calling from low-coverage sequence data, while SVseq1 only supports deletion finding. The program SVseq2 can be downloaded at http://www.engr.uconn.edu/similar to jiz08001/. Conclusions: SVseq2 enables accurate and efficient SV calling through split-read mapping within focal regions using paired-end reads. For many simulated data and real sequence data, SVseq2 outperforms some other existing approaches in accuracy and efficiency, especially when sequence coverage is low.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data
    Jin Zhang
    Jiayin Wang
    Yufeng Wu
    BMC Bioinformatics, 13
  • [2] An accurate assignment test for extremely low-coverage whole-genome sequence data
    Ferrari, Giada
    Atmore, Lane M.
    Jentoft, Sissel
    Jakobsen, Kjetill S.
    Makowiecki, Daniel
    Barrett, James H.
    Star, Bastiaan
    MOLECULAR ECOLOGY RESOURCES, 2022, 22 (04) : 1330 - 1344
  • [3] SVseq: an approach for detecting exact breakpoints of deletions with low-coverage sequence data
    Zhang, Jin
    Wu, Yufeng
    BIOINFORMATICS, 2011, 27 (23) : 3228 - 3234
  • [4] Accurate Genotype Imputation in Multiparental Populations from Low-Coverage Sequence
    Zheng, Chaozhi
    Boer, Martin P.
    van Eeuwijk, Fred A.
    GENETICS, 2018, 210 (01) : 71 - 82
  • [5] Comparing a few SNP calling algorithms using low-coverage sequencing data
    Yu, Xiaoqing
    Sun, Shuying
    BMC BIOINFORMATICS, 2013, 14
  • [6] Comparing a few SNP calling algorithms using low-coverage sequencing data
    Xiaoqing Yu
    Shuying Sun
    BMC Bioinformatics, 14
  • [7] CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks
    Wang, Jing
    Ling, Cheng
    Gao, Jingyang
    BIOMED RESEARCH INTERNATIONAL, 2017, 2017
  • [8] Fast imputation using medium or low-coverage sequence data
    Paul M. VanRaden
    Chuanyu Sun
    Jeffrey R. O’Connell
    BMC Genetics, 16
  • [9] Fast imputation using medium or low-coverage sequence data
    VanRaden, Paul M.
    Sun, Chuanyu
    O'Connell, Jeffrey R.
    BMC GENETICS, 2015, 16
  • [10] A Novel Approach to Estimating Heterozygosity from Low-Coverage Genome Sequence
    Bryc, Katarzyna
    Patterson, Nick
    Reich, David
    GENETICS, 2013, 195 (02) : 553 - 561