Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

被引:81
|
作者
Vollger, Mitchell R. [1 ]
Logsdon, Glennis A. [1 ]
Audano, Peter A. [1 ]
Sulovari, Arvis [1 ]
Porubsky, David [1 ]
Peluso, Paul [2 ]
Wenger, Aaron M. [2 ]
Concepcion, Gregory T. [2 ]
Kronenberg, Zev N. [2 ]
Munson, Katherine M. [1 ]
Baker, Carl [1 ]
Sanders, Ashley D. [3 ]
Spierings, Diana C. J. [4 ]
Lansdorp, Peter M. [4 ,5 ,6 ]
Surti, Urvashi [7 ,8 ]
Hunkapiller, Michael W. [2 ]
Eichler, Evan E. [1 ,9 ]
机构
[1] Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
[2] Pacific Biosci Calif, Menlo Pk, CA USA
[3] European Mol Biol Lab, Genome Biol Unit, Heidelberg, Germany
[4] Univ Groningen, Univ Med Ctr Groningen, European Res Inst Biol Ageing, Groningen, Netherlands
[5] BC Canc Agcy, Terry Fox Lab, Vancouver, BC, Canada
[6] Univ British Columbia, Dept Med Genet, Vancouver, BC, Canada
[7] Univ Pittsburgh, Sch Med, Dept Pathol, Pittsburgh, PA USA
[8] Univ Pittsburgh, Med Ctr, Pittsburgh, PA USA
[9] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
基金
欧洲研究理事会; 美国国家卫生研究院;
关键词
genome assembly; long-read sequencing; segmental duplications; structural variation; tandem repeats; REGIONS;
D O I
10.1111/ahg.12364
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.
引用
收藏
页码:125 / 140
页数:16
相关论文
共 50 条
  • [1] Genome assembly of American mink (Neovison vison) using high-fidelity long reads
    Karimi, Karim
    Duy Ngoc Do
    Miar, Younes
    JOURNAL OF ANIMAL SCIENCE, 2021, 99 : 241 - 241
  • [2] Benchmarking datasets for assembly-based variant calling using high-fidelity long reads
    Lee, Hyunji
    Kim, Jun
    Lee, Junho
    BMC GENOMICS, 2023, 24 (01)
  • [3] Benchmarking datasets for assembly-based variant calling using high-fidelity long reads
    Hyunji Lee
    Jun Kim
    Junho Lee
    BMC Genomics, 24
  • [4] Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads
    Anton Bankevich
    Andrey V. Bzikadze
    Mikhail Kolmogorov
    Dmitry Antipov
    Pavel A. Pevzner
    Nature Biotechnology, 2022, 40 : 1075 - 1081
  • [5] Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads
    Bankevich, Anton
    Bzikadze, Andrey V.
    Kolmogorov, Mikhail
    Antipov, Dmitry
    Pevzner, Pavel A.
    NATURE BIOTECHNOLOGY, 2022, 40 (07) : 1075 - +
  • [6] Metagenome assembly of high-fidelity long reads with hifiasm-meta
    Xiaowen Feng
    Haoyu Cheng
    Daniel Portik
    Heng Li
    Nature Methods, 2022, 19 : 671 - 674
  • [7] Metagenome assembly of high-fidelity long reads with hifiasm-meta
    Feng, Xiaowen
    Cheng, Haoyu
    Portik, Daniel
    Li, Heng
    NATURE METHODS, 2022, 19 (06) : 671 - +
  • [8] High-Fidelity Modeling of Single-Molecule Quantum Electronic Devices
    Lyshevski, Sergey Edward
    NANOTECHNOLOGY 2011: ELECTRONICS, DEVICES, FABRICATION, MEMS, FLUIDICS AND COMPUTATIONAL, NSTI-NANOTECH 2011, VOL 2, 2011, : 675 - 678
  • [9] A survey of the sorghum transcriptome using single-molecule long reads
    Abdel-Ghany, Salah E.
    Hamilton, Michael
    Jacobi, Jennifer L.
    Ngam, Peter
    Devitt, Nicholas
    Schilkey, Faye
    Ben-Hur, Asa
    Reddy, Anireddy S. N.
    NATURE COMMUNICATIONS, 2016, 7
  • [10] A survey of the sorghum transcriptome using single-molecule long reads
    Salah E. Abdel-Ghany
    Michael Hamilton
    Jennifer L. Jacobi
    Peter Ngam
    Nicholas Devitt
    Faye Schilkey
    Asa Ben-Hur
    Anireddy S. N. Reddy
    Nature Communications, 7