In vitro, long-range sequence information for de novo genome assembly via transposase contiguity

被引:128
|
作者
Adey, Andrew [1 ]
Kitzman, Jacob O. [1 ]
Burton, Joshua N. [1 ]
Daza, Riza [1 ]
Kumar, Akash [1 ]
Christiansen, Lena [2 ]
Ronaghi, Mostafa [2 ]
Amini, Sasan [2 ]
Gunderson, Kevin L. [2 ]
Steemers, Frank J. [2 ]
Shendure, Jay [1 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98115 USA
[2] Illumina Inc, Adv Res Grp, San Diego, CA 92122 USA
基金
美国国家科学基金会;
关键词
LOW-INPUT; CONSTRUCTION; CHROMATIN; DATABASE; READS;
D O I
10.1101/gr.178319.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to >1 megabase. These pools are "subhaploid," in that the lengths of fragments contained in each pool sums to similar to 5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate "joins" are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight-to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
引用
收藏
页码:2041 / 2049
页数:9
相关论文
共 50 条
  • [41] ntLink: A Toolkit for De Novo Genome Assembly Scaffolding and Mapping Using Long Reads
    Coombe, Lauren
    Warren, Rene L.
    Wong, Johnathan
    Nikolic, Vladimir
    Birol, Inanc
    CURRENT PROTOCOLS, 2023, 3 (04):
  • [42] Mapping and De Novo Assembly of Long Barcoded Molecules of DNA from a Cancer Genome
    Reifenberger, Jeff G.
    Dzakula, Zeljko
    Dergachev, Vladimir
    Anantharaman, Thomas
    Hastie, Alex
    Chan, Saki
    Cao, Han
    BIOPHYSICAL JOURNAL, 2015, 108 (02) : 149A - 149A
  • [43] De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads
    Eccles, David
    Chandler, Jodie
    Camberis, Mali
    Henrissat, Bernard
    Koren, Sergey
    Le Gros, Graham
    Ewbank, Jonathan J.
    BMC BIOLOGY, 2018, 16
  • [44] De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads
    David Eccles
    Jodie Chandler
    Mali Camberis
    Bernard Henrissat
    Sergey Koren
    Graham Le Gros
    Jonathan J. Ewbank
    BMC Biology, 16
  • [45] MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment
    Alic, Andy S.
    Tomas, Andres
    Medina, Ignacio
    Blanquer, Ignacio
    INFORMATION SCIENCES, 2016, 329 : 206 - 219
  • [46] Toward long-range adaptive communication via information centric networking
    Dowling A.
    Huie L.
    Njilla L.
    Zhao H.
    Liu Y.
    Intelligent and Converged Networks, 2021, 2 (01): : 1 - 15
  • [47] Robust Error Correction for De Novo Assembly via Spectral Partitioning and Sequence Alignment
    Alic, Andrei
    Tomas, Andres
    Salavert, Jose
    Medina, Ignacio
    Blanquer, Ignacio
    PROCEEDINGS IWBBIO 2014: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1 AND 2, 2014, : 1040 - 1048
  • [48] De novo determination of protein structure by NMR using orientational and long-range order restraints
    Hus, JC
    Marion, D
    Blackledge, M
    JOURNAL OF MOLECULAR BIOLOGY, 2000, 298 (05) : 927 - 936
  • [49] Chromosome-scale shotgun assembly using an in vitro method for long-range linkage
    Putnam, Nicholas H.
    O'Connell, Brendan L.
    Stites, Jonathan C.
    Rice, Brandon J.
    Blanchette, Marco
    Calef, Robert
    Troll, Christopher J.
    Fields, Andrew
    Hartley, Paul D.
    Sugnet, Charles W.
    Haussler, David
    Rokhsar, Daniel S.
    Green, Richard E.
    GENOME RESEARCH, 2016, 26 (03) : 342 - 350
  • [50] De novo sequence assembly of Albugo candida reveals a small genome relative to other biotrophic oomycetes
    Links, Matthew G.
    Holub, Eric
    Jiang, Rays H. Y.
    Sharpe, Andrew G.
    Hegedus, Dwayne
    Beynon, Elena
    Sillito, Dean
    Clarke, Wayne E.
    Uzuhashi, Shihomi
    Borhan, Mohammad H.
    BMC GENOMICS, 2011, 12