NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data

被引:20
|
作者
Fang, Li [1 ,2 ,3 ]
Hu, Jiang [1 ]
Wang, Depeng [1 ]
Wang, Kai [2 ,3 ,4 ,5 ]
机构
[1] Grandomics Biosci, Beijing 102206, Peoples R China
[2] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[3] Univ Penn, Dept Pathol & Lab Med, Perelman Sch Med, Philadelphia, PA 19104 USA
[4] Columbia Univ, Dept Biomed Informat, Med Ctr, New York, NY 10032 USA
[5] Columbia Univ, Inst Genom Med, Med Ctr, New York, NY 10032 USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
Long-read sequencing; Structural variants; Low coverage; PacBio; DE-NOVO MUTATIONS; HUMAN GENOME; DISEASE; MECHANISMS; CANCER;
D O I
10.1186/s12859-018-2207-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers. Results: In this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5 to 94.1% for deletions and 87.9 to 93.2% for insertions, indicating that similar to 10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. Conclusions: Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing data
    Lesack, Kyle J.
    Wasmuth, James D.
    PEERJ, 2024, 12 : 1 - 19
  • [12] Decoil: Reconstructing Extrachromosomal DNA Structural Heterogeneity from Long-Read Sequencing Data
    Giurgiu, Madalina
    Wittstruck, Nadine
    Rodriguez-Fos, Elias
    Gonzalez, Rocio Chamorro
    Brueckner, Lotte
    Krienelke-Szymansky, Annabell
    Helmsauer, Konstantin
    Hartebrodt, Anne
    Euskirchen, Philipp
    Koche, Richard P.
    Haase, Kerstin
    Reinert, Knut
    Henssen, Anton G.
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024, 2024, 14758 : 406 - 411
  • [13] Detecting and phasing minor single-nucleotide variants from long-read sequencing data
    Zhixing Feng
    Jose C. Clemente
    Brandon Wong
    Eric E. Schadt
    Nature Communications, 12
  • [14] Detecting and phasing minor single-nucleotide variants from long-read sequencing data
    Feng, Zhixing
    Clemente, Jose C.
    Wong, Brandon
    Schadt, Eric E.
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [15] Long-read genome sequencing identifies cryptic structural variants in congenital aniridia cases
    Damian, Alejandra
    Nunez-Moreno, Gonzalo
    Jubin, Claire
    Tamayo, Alejandra
    de Alba, Marta Rodriguez
    Villaverde, Cristina
    Fund, Cedric
    Delepine, Marc
    Leduc, Aurelie
    Deleuze, Jean Francois
    Minguez, Pablo
    Ayuso, Carmen
    Corton, Marta
    HUMAN GENOMICS, 2023, 17 (01)
  • [16] Long-read genome sequencing identifies cryptic structural variants in congenital aniridia cases
    Alejandra Damián
    Gonzalo Núñez-Moreno
    Claire Jubin
    Alejandra Tamayo
    Marta Rodríguez de Alba
    Cristina Villaverde
    Cédric Fund
    Marc Delépine
    Aurélie Leduc
    Jean François Deleuze
    Pablo Mínguez
    Carmen Ayuso
    Marta Corton
    Human Genomics, 17
  • [17] Are we there yet? Benchmarking low-coverage nanopore long-read sequencing for the assembling of mitochondrial genomes using the vulnerable silky shark Carcharhinus falciformis
    J. Antonio Baeza
    F. J. García-De León
    BMC Genomics, 23
  • [18] Are we there yet? Benchmarking low-coverage nanopore long-read sequencing for the assembling of mitochondrial genomes using the vulnerable silky shark Carcharhinus falciformis
    Antonio Baeza, J.
    Garcia-De Leon, F. J.
    BMC GENOMICS, 2022, 23 (01)
  • [19] SVLR: Genome Structural Variant Detection Using Long-Read Sequencing Data
    Gu, Wenyan
    Zhou, Aizhong
    Wang, Lusheng
    Sun, Shiwei
    Cui, Xuefeng
    Zhu, Daming
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2021, 28 (08) : 774 - 788
  • [20] Reconstructing extrachromosomal DNA structural heterogeneity from long-read sequencing data using Decoil
    Giurgiu, Madalina
    Wittstruck, Nadine
    Rodriguez-Fos, Elias
    Chamorro Gonzalez, Rocio
    Brueckner, Lotte
    Krienelke-Szymansky, Annabell
    Helmsauer, Konstantin
    Hartebrodt, Anne
    Euskirchen, Philipp
    Koche, Richard P.
    Haase, Kerstin
    Reinert, Knut
    Henssen, Anton G.
    GENOME RESEARCH, 2024, 34 (09) : 1355 - 1364