Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

被引:81
|
作者
Vollger, Mitchell R. [1 ]
Logsdon, Glennis A. [1 ]
Audano, Peter A. [1 ]
Sulovari, Arvis [1 ]
Porubsky, David [1 ]
Peluso, Paul [2 ]
Wenger, Aaron M. [2 ]
Concepcion, Gregory T. [2 ]
Kronenberg, Zev N. [2 ]
Munson, Katherine M. [1 ]
Baker, Carl [1 ]
Sanders, Ashley D. [3 ]
Spierings, Diana C. J. [4 ]
Lansdorp, Peter M. [4 ,5 ,6 ]
Surti, Urvashi [7 ,8 ]
Hunkapiller, Michael W. [2 ]
Eichler, Evan E. [1 ,9 ]
机构
[1] Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
[2] Pacific Biosci Calif, Menlo Pk, CA USA
[3] European Mol Biol Lab, Genome Biol Unit, Heidelberg, Germany
[4] Univ Groningen, Univ Med Ctr Groningen, European Res Inst Biol Ageing, Groningen, Netherlands
[5] BC Canc Agcy, Terry Fox Lab, Vancouver, BC, Canada
[6] Univ British Columbia, Dept Med Genet, Vancouver, BC, Canada
[7] Univ Pittsburgh, Sch Med, Dept Pathol, Pittsburgh, PA USA
[8] Univ Pittsburgh, Med Ctr, Pittsburgh, PA USA
[9] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
基金
欧洲研究理事会; 美国国家卫生研究院;
关键词
genome assembly; long-read sequencing; segmental duplications; structural variation; tandem repeats; REGIONS;
D O I
10.1111/ahg.12364
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.
引用
收藏
页码:125 / 140
页数:16
相关论文
共 50 条
  • [21] Assembly and diploid architecture of an individual human genome via single-molecule technologies
    Pendleton, Matthew
    Sebra, Robert
    Pang, Andy Wing Chun
    Ummat, Ajay
    Franzen, Oscar
    Rausch, Tobias
    Stuetz, Adrian M.
    Stedman, William
    Anantharaman, Thomas
    Hastie, Alex
    Dai, Heng
    Fritz, Markus Hsi-Yang
    Cao, Han
    Cohainl, Ariella
    Deikusl, Gintaras
    Durrett, Russell E.
    Blanchard, Scott C.
    Altman, Roger
    Chin, Chen-Shan
    Guo, Yan
    Paxinos, Ellen E.
    Korbe, Jan O.
    Darnell, Robert B.
    McCombiemii, W. Richard
    Kwok, Pui-Yan
    Mason, Christopher E.
    Schadt, Eric E.
    Bashirl, Ali
    NATURE METHODS, 2015, 12 (08) : 780 - 786
  • [22] Resolving the complexity of the human genome using single-molecule sequencing
    Mark J. P. Chaisson
    John Huddleston
    Megan Y. Dennis
    Peter H. Sudmant
    Maika Malig
    Fereydoun Hormozdiari
    Francesca Antonacci
    Urvashi Surti
    Richard Sandstrom
    Matthew Boitano
    Jane M. Landolin
    John A. Stamatoyannopoulos
    Michael W. Hunkapiller
    Jonas Korlach
    Evan E. Eichler
    Nature, 2015, 517 : 608 - 611
  • [23] Resolving the complexity of the human genome using single-molecule sequencing
    Chaisson, Mark J. P.
    Huddleston, John
    Dennis, Megan Y.
    Sudmant, Peter H.
    Malig, Maika
    Hormozdiari, Fereydoun
    Antonacci, Francesca
    Surti, Urvashi
    Sandstrom, Richard
    Boitano, Matthew
    Landolin, Jane M.
    Stamatoyannopoulos, John A.
    Hunkapiller, Michael W.
    Korlach, Jonas
    Eichler, Evan E.
    NATURE, 2015, 517 (7536) : 608 - U163
  • [24] Improved microbial genomes and gene catalog of the chicken gut from metagenomic sequencing of high-fidelity long reads
    Zhang, Yan
    Jiang, Fan
    Yang, Boyuan
    Wang, Sen
    Wang, Hengchao
    Wang, Anqi
    Xu, Dong
    Fan, Wei
    GIGASCIENCE, 2022, 11
  • [25] Improved microbial genomes and gene catalog of the chicken gut from metagenomic sequencing of high-fidelity long reads
    Zhang, Yan
    Jiang, Fan
    Yang, Boyuan
    Wang, Sen
    Wang, Hengchao
    Wang, Anqi
    Xu, Dong
    Fan, Wei
    GIGASCIENCE, 2022, 11
  • [26] Translocator: local realignment and global remapping enabling accurate translocation detection using single-molecule sequencing long reads
    Wu, Ye
    Luo, Ruibang
    Lam, Tak-Wah
    Ting, Hing-Fung
    Wang, Junwen
    ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [27] High-resolution human genome structure by single-molecule analysis
    Teague, Brian
    Waterman, Michael S.
    Goldstein, Steven
    Potamousis, Konstantinos
    Zhou, Shiguo
    Reslewic, Susan
    Sarkar, Deepayan
    Valouev, Anton
    Churas, Christopher
    Kidd, Jeffrey M.
    Kohn, Scott
    Runnheim, Rodney
    Lamers, Casey
    Forrest, Dan
    Newton, Michael A.
    Eichler, Evan E.
    Kent-First, Marijo
    Surti, Urvashi
    Livny, Miron
    Schwartz, David C.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (24) : 10848 - 10853
  • [28] High-Fidelity Single Molecule Quantification in a Flow Cytometer Using Multiparametric Optical Analysis
    Smith, Lucas D.
    Liu, Yang
    Zahid, Mohammad U.
    Canady, Taylor D.
    Wang, Liang
    Kohli, Manish
    Cunningham, Brian T.
    Smith, Andrew M.
    ACS NANO, 2020, 14 (02) : 2324 - 2335
  • [29] Haplotype-resolved assembly of the mule duck genome using high-fidelity sequencing technology
    Che, Tiandong
    Li, Jing
    Li, Xiaobo
    Wang, Zhongsi
    Zhang, Xuemei
    Yang, Weifei
    Liu, Tao
    Wang, Yan
    Wang, Kaiqian
    Gao, Tian
    Shen, Guangqiang
    Qiu, Wanling
    Li, Zhimin
    Zhang, Wenguang
    PLOS ONE, 2024, 19 (07):
  • [30] LongStitch: high-quality genome assembly correction and scaffolding using long reads
    Lauren Coombe
    Janet X. Li
    Theodora Lo
    Johnathan Wong
    Vladimir Nikolic
    René L. Warren
    Inanc Birol
    BMC Bioinformatics, 22