An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data

被引:48
|
作者
Zhang, Jin [1 ]
Wang, Jiayin [1 ]
Wu, Yufeng [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
美国国家科学基金会;
关键词
NUCLEOTIDE-RESOLUTION; COPY NUMBER; VARIANTS; BREAKPOINTS; DELETIONS; ALIGNMENT;
D O I
10.1186/1471-2105-13-S6-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recent advances in sequencing technologies make it possible to comprehensively study structural variations (SVs) using sequence data of large-scale populations. Currently, more efforts have been taken to develop methods that call SVs with exact breakpoints. Among these approaches, split-read mapping methods can be applied on low-coverage sequence data. With increasing amount of data generated, more efficient split-read mapping methods are still needed. Also, since sequence errors can not be avoided for the current sequencing technologies, more accurate split-read mapping methods are still needed to better handle sequence errors. Results: In this paper, we present a split-read mapping method implemented in the program SVseq2 which improves our previous work SVseq1. Similar to SVseq1, SVseq2 calls deletions (and insertions) with exact breakpoints. SVseq2 achieves more accurate calling through split-read mapping within focal regions. SVseq2 also has a much desired feature: there is no need to specify the maximum deletion size, while some existing split-read mapping methods need more memory and longer running time when larger maximum deletion size is chosen. SVseq2 is also much faster because it only needs to examine a small number of ways of splitting the reads. Moreover, SVseq2 supports insertion calling from low-coverage sequence data, while SVseq1 only supports deletion finding. The program SVseq2 can be downloaded at http://www.engr.uconn.edu/similar to jiz08001/. Conclusions: SVseq2 enables accurate and efficient SV calling through split-read mapping within focal regions using paired-end reads. For many simulated data and real sequence data, SVseq2 outperforms some other existing approaches in accuracy and efficiency, especially when sequence coverage is low.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Detecting Pathogenic Structural Variants with Low-Coverage PacBio Sequencing
    Hickey, L.
    Wenger, A. M.
    Baybayan, P.
    Peluso, P.
    Korlach, J.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 729 - 729
  • [42] A protocol for applying low-coverage whole-genome sequencing data in structural variation studies
    Liu, Qi
    Xie, Bo
    Gao, Yang
    Xu, Shuhua
    Lu, Yan
    STAR PROTOCOLS, 2023, 4 (03):
  • [44] Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data
    Kobayashi, Masaaki
    Ohyanagi, Hajime
    Takanashi, Hideki
    Asano, Satomi
    Kudo, Toru
    Kajiya-Kanegae, Hiromi
    Nagano, Atsushi J.
    Tainaka, Hitoshi
    Tokunaga, Tsuyoshi
    Sazuka, Takashi
    Iwata, Hiroyoshi
    Tsutsumi, Nobuhiro
    Yano, Kentaro
    DNA RESEARCH, 2017, 24 (04) : 397 - 405
  • [45] A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data
    Miao Zhang
    Yiwen Liu
    Hua Zhou
    Joseph Watkins
    Jin Zhou
    BMC Bioinformatics, 22
  • [46] Kinship Estimation Based on Extremely Low-Coverage Sequencing Data
    Dou, Jinzhuang
    Chothani, Sonia
    Sim, Xueling
    Hughes, Jason D.
    Reilly, Dermot F.
    Tai, E. Shyong
    Liu, Jianjun
    Wang, Chaolong
    GENETIC EPIDEMIOLOGY, 2016, 40 (07) : 619 - 620
  • [47] A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data
    Zhang, Miao
    Liu, Yiwen
    Zhou, Hua
    Watkins, Joseph
    Zhou, Jin
    BMC BIOINFORMATICS, 2021, 22 (01) : 348
  • [48] Publisher Correction: Efficient phasing and imputation of low-coverage sequencing data using large reference panels
    Simone Rubinacci
    Diogo M. Ribeiro
    Robin J. Hofmeister
    Olivier Delaneau
    Nature Genetics, 2021, 53 : 412 - 412
  • [49] Structural determination of the low-coverage phase of Al on Si(001) surface
    Park, JY
    Seo, JH
    Whang, CN
    Kim, SS
    Choi, DS
    Chae, KH
    JOURNAL OF CHEMICAL PHYSICS, 2005, 122 (24):
  • [50] Efficient Imputation of Missing Markers in Low-Coverage Genotyping-by-Sequencing Data from Multiparental Crosses
    Huang, B. Emma
    Raghavan, Chitra
    Mauleon, Ramil
    Broman, Karl W.
    Leung, Hei
    GENETICS, 2014, 197 (01) : 401 - 404