An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data

被引：48

作者：

Zhang, Jin ^{[1
]}

Wang, Jiayin ^{[1
]}

Wu, Yufeng ^{[1
]}

机构：

[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA

来源：

BMC BIOINFORMATICS | 2012年 / 13卷

基金：

美国国家科学基金会;

关键词：

NUCLEOTIDE-RESOLUTION; COPY NUMBER; VARIANTS; BREAKPOINTS; DELETIONS; ALIGNMENT;

D O I：

10.1186/1471-2105-13-S6-S6

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Recent advances in sequencing technologies make it possible to comprehensively study structural variations (SVs) using sequence data of large-scale populations. Currently, more efforts have been taken to develop methods that call SVs with exact breakpoints. Among these approaches, split-read mapping methods can be applied on low-coverage sequence data. With increasing amount of data generated, more efficient split-read mapping methods are still needed. Also, since sequence errors can not be avoided for the current sequencing technologies, more accurate split-read mapping methods are still needed to better handle sequence errors. Results: In this paper, we present a split-read mapping method implemented in the program SVseq2 which improves our previous work SVseq1. Similar to SVseq1, SVseq2 calls deletions (and insertions) with exact breakpoints. SVseq2 achieves more accurate calling through split-read mapping within focal regions. SVseq2 also has a much desired feature: there is no need to specify the maximum deletion size, while some existing split-read mapping methods need more memory and longer running time when larger maximum deletion size is chosen. SVseq2 is also much faster because it only needs to examine a small number of ways of splitting the reads. Moreover, SVseq2 supports insertion calling from low-coverage sequence data, while SVseq1 only supports deletion finding. The program SVseq2 can be downloaded at http://www.engr.uconn.edu/similar to jiz08001/. Conclusions: SVseq2 enables accurate and efficient SV calling through split-read mapping within focal regions using paired-end reads. For many simulated data and real sequence data, SVseq2 outperforms some other existing approaches in accuracy and efficiency, especially when sequence coverage is low.

引用

页数：11

共 50 条

[31] Variant calling in low-coverage whole genome sequencing of a Native American population sample
Chris Bizon
Michael Spiegel
Scott A Chasse
Ian R Gizer
Yun Li
Ewa P Malc
Piotr A Mieczkowski
Josh K Sailsbery
Xiaoshu Wang
Cindy L Ehlers
Kirk C Wilhelmsen
BMC Genomics, 15
[32] Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads
Duitama, Jorge
Kennedy, Justin
Dinakar, Sanjiv
Hernandez, Yoezen
Wu, Yufeng
Mandoiu, Ion I.
BMC BIOINFORMATICS, 2011, 12
[33] Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads
Jorge Duitama
Justin Kennedy
Sanjiv Dinakar
Yözen Hernández
Yufeng Wu
Ion I Măndoiu
BMC Bioinformatics, 12
[34] PMAT: an efficient plant mitogenome assembly toolkit using low-coverage HiFi sequencing data
Bi, Changwei
Shen, Fei
Han, Fuchuan
Qu, Yanshu
Hou, Jing
Xu, Kewang
Xu, Li-an
He, Wenchuang
Wu, Zhiqiang
Yin, Tongming
HORTICULTURE RESEARCH, 2024, 11 (03)
[35] Detecting inherited and novel structural variants in low-coverage parent-child sequencing data
Spence, Melissa
Banuelos, Mario
Marcia, Roummel F.
Sindi, Suzanne
METHODS, 2020, 173 : 61 - 68
[36] Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm
José Ignacio Lucas-Lledó
David Vicente-Salvador
Cristina Aguado
Mario Cáceres
BMC Bioinformatics, 15
[37] Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm
Ignacio Lucas-Lledo, Jose
Vicente-Salvador, David
Aguado, Cristina
Caceres, Mario
BMC BIOINFORMATICS, 2014, 15
[38] AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data
Nathan K. Schaefer
Beth Shapiro
Richard E. Green
BMC Bioinformatics, 18
[39] Robust Linear Trend Test for Low-Coverage Next-Generation Sequence Data Controlling for Covariates
Lee, Jung Yeon
Kim, Myeong-Kyu
Kim, Wonkuk
MATHEMATICS, 2020, 8 (02)
[40] AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data
Schaefer, Nathan K.
Shapiro, Beth
Green, Richard E.
BMC BIOINFORMATICS, 2017, 18

← 1 2 3 4 5 →