NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data

被引:20
|
作者
Fang, Li [1 ,2 ,3 ]
Hu, Jiang [1 ]
Wang, Depeng [1 ]
Wang, Kai [2 ,3 ,4 ,5 ]
机构
[1] Grandomics Biosci, Beijing 102206, Peoples R China
[2] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[3] Univ Penn, Dept Pathol & Lab Med, Perelman Sch Med, Philadelphia, PA 19104 USA
[4] Columbia Univ, Dept Biomed Informat, Med Ctr, New York, NY 10032 USA
[5] Columbia Univ, Inst Genom Med, Med Ctr, New York, NY 10032 USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
Long-read sequencing; Structural variants; Low coverage; PacBio; DE-NOVO MUTATIONS; HUMAN GENOME; DISEASE; MECHANISMS; CANCER;
D O I
10.1186/s12859-018-2207-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers. Results: In this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5 to 94.1% for deletions and 87.9 to 93.2% for insertions, indicating that similar to 10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. Conclusions: Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Correction: In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants
    Tuan V. Nguyen
    Christy J. Vander Jagt
    Jianghui Wang
    Hans D. Daetwyler
    Ruidong Xiang
    Michael E. Goddard
    Loan T. Nguyen
    Elizabeth M. Ross
    Ben J. Hayes
    Amanda J. Chamberlain
    Iona M. MacLeod
    Genetics Selection Evolution, 55
  • [32] Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data
    Duan, Xiaoke
    Pan, Mingpei
    Fan, Shaohua
    BMC GENOMICS, 2022, 23 (01)
  • [33] Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data
    Xiaoke Duan
    Mingpei Pan
    Shaohua Fan
    BMC Genomics, 23
  • [34] High performance imputation of structural and single nucleotide variants using low-coverage whole genome sequencing
    Gundappa, Manu Kumar
    Robledo, Diego
    Hamilton, Alastair
    Houston, Ross D.
    Prendergast, James G. D.
    Macqueen, Daniel J.
    GENETICS SELECTION EVOLUTION, 2025, 57 (01)
  • [35] Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data
    Thomas A. Delomas
    Stuart C. Willis
    BMC Bioinformatics, 24
  • [36] Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data
    Delomas, Thomas A.
    Willis, Stuart C.
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [37] Characterizing Bias in Population Genetic Inferences from Low-Coverage Sequencing Data
    Han, Eunjung
    Sinsheimer, Janet S.
    Novembre, John
    MOLECULAR BIOLOGY AND EVOLUTION, 2014, 31 (03) : 723 - 735
  • [38] Long-Read Sequencing Identifies the First Retrotransposon Insertion and Resolves Structural Variants Causing Antithrombin Deficiency
    De La Morena-Barrio, Belen
    Stephens, Jonathan
    Eugenia De La Morena-Barrio, Maria
    Stefanucci, Luca
    Padilla, Jose
    Minano, Antonia
    Gleadall, Nicholas
    Luis Garcia, Juan
    Fernanda Lopez-Fernandez, Maria
    Morange, Pierre-Emmanuel
    Puurunen, Marja
    Undas, Anetta
    Vidal, Francisco
    Raymond, Frances Lucy
    Vicente, Vicente
    Ouwehand, Willem H.
    Corral, Javier
    Sanchis-Juan, Alba
    THROMBOSIS AND HAEMOSTASIS, 2022, 122 (08) : 1369 - 1378
  • [39] SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing
    Danis, Daniel
    Jacobsen, Julius O. B.
    Balachandran, Parithi
    Zhu, Qihui
    Yilmaz, Feyza
    Reese, Justin
    Haimel, Matthias
    Lyon, Gholson J.
    Helbig, Ingo
    Mungall, Christopher J.
    Beck, Christine R.
    Lee, Charles
    Smedley, Damian
    Robinson, Peter N.
    GENOME MEDICINE, 2022, 14 (01)
  • [40] Long-read sequencing reveals heritable large structural variants induced by CRISPR-Cas9
    Hoijer, Ida
    Emmanouilidou, Anastasia
    Feuk, Lars
    Gyllensten, Ulf
    den Hoed, Marcel
    Ameur, Adam
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 4 - 4