Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

被引:81
|
作者
Vollger, Mitchell R. [1 ]
Logsdon, Glennis A. [1 ]
Audano, Peter A. [1 ]
Sulovari, Arvis [1 ]
Porubsky, David [1 ]
Peluso, Paul [2 ]
Wenger, Aaron M. [2 ]
Concepcion, Gregory T. [2 ]
Kronenberg, Zev N. [2 ]
Munson, Katherine M. [1 ]
Baker, Carl [1 ]
Sanders, Ashley D. [3 ]
Spierings, Diana C. J. [4 ]
Lansdorp, Peter M. [4 ,5 ,6 ]
Surti, Urvashi [7 ,8 ]
Hunkapiller, Michael W. [2 ]
Eichler, Evan E. [1 ,9 ]
机构
[1] Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
[2] Pacific Biosci Calif, Menlo Pk, CA USA
[3] European Mol Biol Lab, Genome Biol Unit, Heidelberg, Germany
[4] Univ Groningen, Univ Med Ctr Groningen, European Res Inst Biol Ageing, Groningen, Netherlands
[5] BC Canc Agcy, Terry Fox Lab, Vancouver, BC, Canada
[6] Univ British Columbia, Dept Med Genet, Vancouver, BC, Canada
[7] Univ Pittsburgh, Sch Med, Dept Pathol, Pittsburgh, PA USA
[8] Univ Pittsburgh, Med Ctr, Pittsburgh, PA USA
[9] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
基金
欧洲研究理事会; 美国国家卫生研究院;
关键词
genome assembly; long-read sequencing; segmental duplications; structural variation; tandem repeats; REGIONS;
D O I
10.1111/ahg.12364
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.
引用
收藏
页码:125 / 140
页数:16
相关论文
共 50 条
  • [31] LongStitch: high-quality genome assembly correction and scaffolding using long reads
    Coombe, Lauren
    Li, Janet X.
    Lo, Theodora
    Wong, Johnathan
    Nikolic, Vladimir
    Warren, Rene L.
    Birol, Inanc
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [32] An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing (vol 6, pg 1, 2017)
    Zimin, Aleksey V.
    Stevens, Kristian A.
    Crepeau, Marc W.
    Puiu, Daniela
    Wegrzyn, Jill L.
    Yorke, James A.
    Langley, Charles H.
    Neale, David B.
    Salzberg, Steven L.
    GIGASCIENCE, 2017, 6 (10):
  • [33] Chromosome-Scale, Haplotype-Resolved Genome Assembly of Non-Sex-Reversal Females of Swamp Eel Using High-Fidelity Long Reads and Hi-C Data
    Tian, Hai-Feng
    Hu, Qiaomu
    Lu, Hong-Yi
    Li, Zhong
    FRONTIERS IN GENETICS, 2022, 13
  • [34] High-sensitivity Single Molecule Fluorescence Detection Using Scanning Single-Molecule Counting
    Yamaguchi, Mitsushiro
    Tanabe, Tetsuya
    Nakata, Hidetaka
    Hanashi, Takuya
    Nishikawa, Kazutaka
    Hori, Kunio
    Kondo, Seiji
    MULTIPHOTON MICROSCOPY IN THE BIOMEDICAL SCIENCES XIV, 2014, 8948
  • [35] An improved assembly of the pearl millet reference genome using Oxford Nanopore long reads and optical mapping
    Salson, Marine
    Orjuela, Julie
    Mariac, Cedric
    Zekraoui, Leila
    Couderc, Marie
    Arribat, Sandrine
    Rodde, Nathalie
    Faye, Adama
    Kane, Ndjido A.
    Tranchant-Dubreuil, Christine
    Vigouroux, Yves
    Berthouly-Salazar, Cecile
    G3-GENES GENOMES GENETICS, 2023, 13 (05):
  • [36] Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
    David Porubsky
    Peter Ebert
    Peter A. Audano
    Mitchell R. Vollger
    William T. Harvey
    Pierre Marijon
    Jana Ebler
    Katherine M. Munson
    Melanie Sorensen
    Arvis Sulovari
    Marina Haukness
    Maryam Ghareghani
    Peter M. Lansdorp
    Benedict Paten
    Scott E. Devine
    Ashley D. Sanders
    Charles Lee
    Mark J. P. Chaisson
    Jan O. Korbel
    Evan E. Eichler
    Tobias Marschall
    Nature Biotechnology, 2021, 39 : 302 - 308
  • [37] Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
    Porubsky, David
    Ebert, Peter
    Audano, Peter A.
    Vollger, Mitchell R.
    Harvey, William T.
    Marijon, Pierre
    Ebler, Jana
    Munson, Katherine M.
    Sorensen, Melanie
    Sulovari, Arvis
    Haukness, Marina
    Ghareghani, Maryam
    Lansdorp, Peter M.
    Paten, Benedict
    Devine, Scott E.
    Sanders, Ashley D.
    Lee, Charles
    Chaisson, Mark J. P.
    Korbel, Jan O.
    Eichler, Evan E.
    Marschall, Tobias
    NATURE BIOTECHNOLOGY, 2021, 39 (03) : 302 - 308
  • [38] De novo Assembly of the Brugia malayi Genome Using Long Reads from a Single MinION Flowcell
    Joseph R. Fauver
    John Martin
    Gary J. Weil
    Makedonka Mitreva
    Peter U. Fischer
    Scientific Reports, 9
  • [39] Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes
    Fatima, Nazeefa
    Petri, Anna
    Gyllensten, Ulf
    Feuk, Lars
    Ameur, Adam
    GENES, 2020, 11 (12) : 1 - 13
  • [40] De novo Assembly of the Brugia malayi Genome Using Long Reads from a Single MinION Flowcell
    Fauver, Joseph R.
    Martin, John
    Weil, Gary J.
    Mitreva, Makedonka
    Fischer, Peter U.
    SCIENTIFIC REPORTS, 2019, 9 (1)