Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data

被引:12
|
作者
Duan, Xiaoke [1 ,2 ]
Pan, Mingpei [1 ,2 ]
Fan, Shaohua [1 ]
机构
[1] Fudan Univ, Zhangjiang Fudan Int Innovat Ctr, Human Phenome Inst, State Key Lab Genet Engn, Shanghai 200438, Peoples R China
[2] Fudan Univ, Sch Life Sci, Dept Anthropol & Human Genet, MOE Key Lab Contemporary Anthropol, Shanghai 200433, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Long-read sequencing; SV genotyping; F1; score; Performance evaluation; EVOLUTION; SELECTION; MUTATION; IMPACT;
D O I
10.1186/s12864-022-08548-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Structural variants (SVs) play a crucial role in gene regulation, trait association, and disease in humans. SV genotyping has been extensively applied in genomics research and clinical diagnosis. Although a growing number of SV genotyping methods for long reads have been developed, a comprehensive performance assessment of these methods has yet to be done. Results Based on one simulated and three real SV datasets, we performed an in-depth evaluation of five SV genotyping methods, including cuteSV, LRcaller, Sniffles, SVJedi, and VaPoR. The results show that for insertions and deletions, cuteSV and LRcaller have similar F1 scores (cuteSV, insertions: 0.69-0.90, deletions: 0.77-0.90 and LRcaller, insertions: 0.67-0.87, deletions: 0.74-0.91) and are superior to other methods. For duplications, inversions, and translocations, LRcaller yields the most accurate genotyping results (0.84, 0.68, and 0.47, respectively). When genotyping SVs located in tandem repeat region or with imprecise breakpoints, cuteSV (insertions and deletions) and LRcaller (duplications, inversions, and translocations) are better than other methods. In addition, we observed a decrease in F1 scores when the SV size increased. Finally, our analyses suggest that the F1 scores of these methods reach the point of diminishing returns at 20x depth of coverage. Conclusions We present an in-depth benchmark study of long-read SV genotyping methods. Our results highlight the advantages and disadvantages of each genotyping method, which provide practical guidance for optimal application selection and prospective directions for tool improvement.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Initial Analysis of Structural Variation Detections in Cattle Using Long-Read Sequencing Methods
    Gao, Yahui
    Ma, Li
    Liu, George E.
    GENES, 2022, 13 (05)
  • [32] Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data
    Vladimir B. C. de Souza
    Ben T. Jordan
    Elizabeth Tseng
    Elizabeth A. Nelson
    Karen K. Hirschi
    Gloria Sheynkman
    Mark D. Robinson
    Genome Biology, 24
  • [33] Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data
    de Souza, Vladimir B. C.
    Jordan, Ben T. T.
    Tseng, Elizabeth
    Nelson, Elizabeth A. A.
    Hirschi, Karen K. K.
    Sheynkman, Gloria
    Robinson, Mark D. D.
    GENOME BIOLOGY, 2023, 24 (01)
  • [34] Variant phasing and haplotypic expression from long-read sequencing in maize
    Bo Wang
    Elizabeth Tseng
    Primo Baybayan
    Kevin Eng
    Michael Regulski
    Yinping Jiao
    Liya Wang
    Andrew Olson
    Kapeel Chougule
    Peter Van Buren
    Doreen Ware
    Communications Biology, 3
  • [35] A survey of algorithms for the detection of genomic structural variants from long-read sequencing data
    Mian Umair Ahsan
    Qian Liu
    Jonathan Elliot Perdomo
    Li Fang
    Kai Wang
    Nature Methods, 2023, 20 : 1143 - 1158
  • [36] Decoil: Reconstructing Extrachromosomal DNA Structural Heterogeneity from Long-Read Sequencing Data
    Giurgiu, Madalina
    Wittstruck, Nadine
    Rodriguez-Fos, Elias
    Gonzalez, Rocio Chamorro
    Brueckner, Lotte
    Krienelke-Szymansky, Annabell
    Helmsauer, Konstantin
    Hartebrodt, Anne
    Euskirchen, Philipp
    Koche, Richard P.
    Haase, Kerstin
    Reinert, Knut
    Henssen, Anton G.
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024, 2024, 14758 : 406 - 411
  • [37] Variant phasing and haplotypic expression from long-read sequencing in maize
    Wang, Bo
    Tseng, Elizabeth
    Baybayan, Primo
    Eng, Kevin
    Regulski, Michael
    Jiao, Yinping
    Wang, Liya
    Olson, Andrew
    Chougule, Kapeel
    Van Buren, Peter
    Ware, Doreen
    COMMUNICATIONS BIOLOGY, 2020, 3 (01)
  • [38] A survey of algorithms for the detection of genomic structural variants from long-read sequencing data
    Ahsan, Mian Umair
    Liu, Qian
    Perdomo, Jonathan Elliot
    Fang, Li
    Wang, Kai
    NATURE METHODS, 2023, 20 (08) : 1143 - 1158
  • [39] Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets
    Portik, Daniel M.
    Brown, C. Titus
    Pierce-Ward, N. Tessa
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [40] Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets
    Daniel M. Portik
    C. Titus Brown
    N. Tessa Pierce-Ward
    BMC Bioinformatics, 23