Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data

被引:12
|
作者
Duan, Xiaoke [1 ,2 ]
Pan, Mingpei [1 ,2 ]
Fan, Shaohua [1 ]
机构
[1] Fudan Univ, Zhangjiang Fudan Int Innovat Ctr, Human Phenome Inst, State Key Lab Genet Engn, Shanghai 200438, Peoples R China
[2] Fudan Univ, Sch Life Sci, Dept Anthropol & Human Genet, MOE Key Lab Contemporary Anthropol, Shanghai 200433, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Long-read sequencing; SV genotyping; F1; score; Performance evaluation; EVOLUTION; SELECTION; MUTATION; IMPACT;
D O I
10.1186/s12864-022-08548-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Structural variants (SVs) play a crucial role in gene regulation, trait association, and disease in humans. SV genotyping has been extensively applied in genomics research and clinical diagnosis. Although a growing number of SV genotyping methods for long reads have been developed, a comprehensive performance assessment of these methods has yet to be done. Results Based on one simulated and three real SV datasets, we performed an in-depth evaluation of five SV genotyping methods, including cuteSV, LRcaller, Sniffles, SVJedi, and VaPoR. The results show that for insertions and deletions, cuteSV and LRcaller have similar F1 scores (cuteSV, insertions: 0.69-0.90, deletions: 0.77-0.90 and LRcaller, insertions: 0.67-0.87, deletions: 0.74-0.91) and are superior to other methods. For duplications, inversions, and translocations, LRcaller yields the most accurate genotyping results (0.84, 0.68, and 0.47, respectively). When genotyping SVs located in tandem repeat region or with imprecise breakpoints, cuteSV (insertions and deletions) and LRcaller (duplications, inversions, and translocations) are better than other methods. In addition, we observed a decrease in F1 scores when the SV size increased. Finally, our analyses suggest that the F1 scores of these methods reach the point of diminishing returns at 20x depth of coverage. Conclusions We present an in-depth benchmark study of long-read SV genotyping methods. Our results highlight the advantages and disadvantages of each genotyping method, which provide practical guidance for optimal application selection and prospective directions for tool improvement.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Comparison of long-read methods for sequencing and assembly of a plant genome
    Murigneux, Valentine
    Rai, Subash Kumar
    Furtado, Agnelo
    Bruxner, Timothy J. C.
    Tian, Wei
    Harliwong, Ivon
    Wei, Hanmin
    Yang, Bicheng
    Ye, Qianyu
    Anderson, Ellis
    Mao, Qing
    Drmanac, Radoje
    Wang, Ou
    Peters, Brock A.
    Xu, Mengyang
    Wu, Pei
    Topp, Bruce
    Coin, Lachlan J. M.
    Henry, Robert J.
    GIGASCIENCE, 2020, 9 (12):
  • [42] Machine Learning-Based Artifact Detection for Long-Read Sequencing Data
    Mbuga, Felix
    Lam, Kathy
    Lee, Wendy
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 582 - 584
  • [43] Long-read sequencing - for detecting clinically relevant structural variation
    Hoischen, A.
    Wenger, A. M.
    van der Vorst, M.
    Kwint, M.
    Nelen, M.
    Neveling, K.
    Baybayan, P.
    Hickey, L.
    Kuijpers, J.
    Korlach, J.
    Corcoran, K.
    Brunner, H. G.
    Vissers, L. E. L. M.
    Gilissen, C.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 849 - 849
  • [44] VISOR: a versatile haplotype-aware structural variant simulator for short- and long-read sequencing
    Bolognini, Davide
    Sanders, Ashley
    Korbel, Jan O.
    Magi, Alberto
    Benes, Vladimir
    Rausch, Tobias
    BIOINFORMATICS, 2020, 36 (04) : 1267 - 1269
  • [45] Comprehensive de novo mutation discovery with HiFi long-read sequencing
    Kucuk, Erdi
    van der Sanden, Bart
    O'Gorman, Luke
    Kwint, Michael
    Derks, Ronny
    Wenger, Aaron
    Lambert, Christine
    Chakraborty, Shreyasee
    Baybayan, Primo
    Rowell, William
    Kronenberg, Zev
    Brunner, Han
    Vissers, Lisenka
    Hoischen, Alexander
    Gilissen, Christian
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 608 - 608
  • [46] Comprehensive de novo mutation discovery with HiFi long-read sequencing
    Kucuk, Erdi
    van der Sanden, Bart P. G. H.
    O'Gorman, Luke
    Kwint, Michael
    Derks, Ronny
    Wenger, Aaron M.
    Lambert, Christine
    Chakraborty, Shreyasee
    Baybayan, Primo
    Rowell, William J.
    Brunner, Han G.
    Vissers, Lisenka E. L. M.
    Hoischen, Alexander
    Gilissen, Christian
    GENOME MEDICINE, 2023, 15 (01)
  • [47] Comprehensive de novo mutation discovery with HiFi long-read sequencing
    Erdi Kucuk
    Bart P. G. H. van der Sanden
    Luke O’Gorman
    Michael Kwint
    Ronny Derks
    Aaron M. Wenger
    Christine Lambert
    Shreyasee Chakraborty
    Primo Baybayan
    William J. Rowell
    Han G. Brunner
    Lisenka E. L. M. Vissers
    Alexander Hoischen
    Christian Gilissen
    Genome Medicine, 15
  • [48] Genome sequencing using long-read sequencing
    McEwen, Juan Guillermo
    Gomez, Oscar Mauricio
    REVISTA DE LA ACADEMIA COLOMBIANA DE CIENCIAS EXACTAS FISICAS Y NATURALES, 2023, 47 (183): : 439 - 444
  • [49] Long-Read Nanopore-Based Sequencing of Anelloviruses
    Anantharam, Raghavendran
    Duchen, Dylan
    Cox, Andrea L.
    Timp, Winston
    Thomas, David L.
    Clipman, Steven J.
    Kandathil, Abraham J.
    VIRUSES-BASEL, 2024, 16 (05):
  • [50] Comprehensive Analysis of Congenital Adrenal Hyperplasia Using Long-Read Sequencing
    Liu, Yingdi
    Chen, Miaomiao
    Liu, Jing
    Mao, Aiping
    Teng, Yanling
    Yan, Huiming
    Zhu, Huimin
    Li, Zhuo
    Liang, Desheng
    Wu, Lingqian
    CLINICAL CHEMISTRY, 2022, 68 (07) : 927 - 939