NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data

被引:20
|
作者
Fang, Li [1 ,2 ,3 ]
Hu, Jiang [1 ]
Wang, Depeng [1 ]
Wang, Kai [2 ,3 ,4 ,5 ]
机构
[1] Grandomics Biosci, Beijing 102206, Peoples R China
[2] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[3] Univ Penn, Dept Pathol & Lab Med, Perelman Sch Med, Philadelphia, PA 19104 USA
[4] Columbia Univ, Dept Biomed Informat, Med Ctr, New York, NY 10032 USA
[5] Columbia Univ, Inst Genom Med, Med Ctr, New York, NY 10032 USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
Long-read sequencing; Structural variants; Low coverage; PacBio; DE-NOVO MUTATIONS; HUMAN GENOME; DISEASE; MECHANISMS; CANCER;
D O I
10.1186/s12859-018-2207-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers. Results: In this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5 to 94.1% for deletions and 87.9 to 93.2% for insertions, indicating that similar to 10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. Conclusions: Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] A beginner's guide to assembling a draft genome and analyzing structural variants with long-read sequencing technologies
    Kim, Jun
    Kim, Chuna
    STAR PROTOCOLS, 2022, 3 (03):
  • [42] Pangenome obtained by long-read sequencing of 11 genomes reveal hidden functional structural variants in pigs
    Jiang, Yi-Fan
    Wang, Sheng
    Wang, Chong-Long
    Xu, Ru-Hai
    Wang, Wen-Wen
    Jiang, Yao
    Wang, Ming-Shan
    Jiang, Li
    Dai, Li-He
    Wang, Jie-Ru
    Chu, Xiao-Hong
    Zeng, Yong-Qing
    Fang, Ling-Zhao
    Wu, Dong-Dong
    Zhang, Qin
    Ding, Xiang-Dong
    ISCIENCE, 2023, 26 (03)
  • [43] Likely pathogenic structural variants in genetically unsolved patients with retinitis pigmentosa revealed by long-read sequencing
    Sano, Yusuke
    Koyanagi, Yoshito
    Wong, Jing Hao
    Murakami, Yusuke
    Fujiwara, Kohta
    Endo, Mikiko
    Aoi, Tomomi
    Hashimoto, Kazuki
    Nakazawa, Toru
    Wada, Yuko
    Ueno, Shinji
    Gao, Dan
    Murakami, Akira
    Hotta, Yoshihiro
    Ikeda, Yasuhiro
    Nishiguchi, Koji M.
    Momozawa, Yukihide
    Sonoda, Koh-Hei
    Akiyama, Masato
    Fujimoto, Akihiro
    JOURNAL OF MEDICAL GENETICS, 2022, 59 (11) : 1133 - 1138
  • [44] SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing
    Daniel Danis
    Julius O. B. Jacobsen
    Parithi Balachandran
    Qihui Zhu
    Feyza Yilmaz
    Justin Reese
    Matthias Haimel
    Gholson J. Lyon
    Ingo Helbig
    Christopher J. Mungall
    Christine R. Beck
    Charles Lee
    Damian Smedley
    Peter N. Robinson
    Genome Medicine, 14
  • [45] A protocol for applying low-coverage whole-genome sequencing data in structural variation studies
    Liu, Qi
    Xie, Bo
    Gao, Yang
    Xu, Shuhua
    Lu, Yan
    STAR PROTOCOLS, 2023, 4 (03):
  • [46] On detection of somatic structural variation in highly repetitive regions using long-read sequencing data
    Shiraishi, Yuichi
    CANCER SCIENCE, 2024, 115 : 31 - 31
  • [47] PolyAtailor: measuring poly(A) tail length from short-read and long-read sequencing data
    Liu, Mengfei
    Hao, Linlin
    Yang, Sien
    Wu, Xiaohui
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (04)
  • [48] Resolution of ring chromosomes, Robertsonian translocations, and complex structural variants from long-read sequencing and telomere-to-telomere assembly
    Mostovoy, Yulia
    Boone, Philip M.
    Huang, Yongqing
    V. Garimella, Kiran
    Tan, Kar-Tong
    Russell, Bianca E.
    Salani, Monica
    Esch, Celine E. F. de
    Lemanski, John
    Curall, Benjamin
    Hauenstein, Jen
    Lucente, Diane
    Bowers, Tera
    Desmet, Tim
    Gabriel, Stacey
    Morton, Cynthia C.
    Meyerson, Matthew
    Hastie, Alex R.
    Gusella, James
    Quintero-Rivera, Fabiola
    Brand, Harrison
    Talkowski, Michael E.
    AMERICAN JOURNAL OF HUMAN GENETICS, 2024, 111 (12)
  • [49] Long-Read Sequencing Resolves Complex Structural Variants and Identifies Missing Disease-Causing Variants in Unsolved Cases of Hemophilia
    Miller, Danny E.
    Galey, Miranda
    Fletcher, Shelley N.
    Lannert, Kerry
    Wheeler, Marsha M.
    Kandhaya-Pillai, Renuka
    Oshima, Junko
    Konkle, Barbara A.
    Eichler, Evan E.
    Johnsen, Jill M.
    BLOOD, 2022, 140 : 10716 - 10717
  • [50] Best practices for genotype imputation from low-coverage sequencing data in natural populations
    Watowich, Marina M.
    Chiou, Kenneth L.
    Graves, Brian
    Montague, Michael J.
    Brent, Lauren J. N.
    Higham, James P.
    Horvath, Julie E.
    Lu, Amy
    Martinez, Melween I.
    Platt, Michael L.
    Schneider-Crease, India A.
    Lea, Amanda J.
    Snyder-Mackler, Noah
    MOLECULAR ECOLOGY RESOURCES, 2023,