In vitro, long-range sequence information for de novo genome assembly via transposase contiguity

被引:128
|
作者
Adey, Andrew [1 ]
Kitzman, Jacob O. [1 ]
Burton, Joshua N. [1 ]
Daza, Riza [1 ]
Kumar, Akash [1 ]
Christiansen, Lena [2 ]
Ronaghi, Mostafa [2 ]
Amini, Sasan [2 ]
Gunderson, Kevin L. [2 ]
Steemers, Frank J. [2 ]
Shendure, Jay [1 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98115 USA
[2] Illumina Inc, Adv Res Grp, San Diego, CA 92122 USA
基金
美国国家科学基金会;
关键词
LOW-INPUT; CONSTRUCTION; CHROMATIN; DATABASE; READS;
D O I
10.1101/gr.178319.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to >1 megabase. These pools are "subhaploid," in that the lengths of fragments contained in each pool sums to similar to 5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate "joins" are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight-to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
引用
收藏
页码:2041 / 2049
页数:9
相关论文
共 50 条
  • [21] De novo diploid genome assembly using long noisy reads
    Fan Nie
    Peng Ni
    Neng Huang
    Jun Zhang
    Zhenyu Wang
    Chuanle Xiao
    Feng Luo
    Jianxin Wang
    Nature Communications, 15
  • [22] De novo diploid genome assembly using long noisy reads
    Nie, Fan
    Ni, Peng
    Huang, Neng
    Zhang, Jun
    Wang, Zhenyu
    Xiao, Chuanle
    Luo, Feng
    Wang, Jianxin
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [23] The sequence and de novo assembly of the genome of the Indian oil sardine, Sardinella longiceps
    Sukumaran, Sandhya
    Sebastian, Wilson
    Gopalakrishnan, A.
    Mathew, Oommen K.
    Vysakh, V. G.
    Rohit, Prathibha
    Jena, J. K.
    SCIENTIFIC DATA, 2023, 10 (01)
  • [24] The sequence and de novo assembly of the genome of the Indian oil sardine, Sardinella longiceps
    Sandhya Sukumaran
    Wilson Sebastian
    A. Gopalakrishnan
    Oommen K. Mathew
    V. G. Vysakh
    Prathibha Rohit
    J. K. Jena
    Scientific Data, 10
  • [25] Integrating long-range connectivity information into de Bruijn graphs
    Turner, Isaac
    Garimella, Kiran V.
    Iqbal, Zamin
    McVean, Gil
    BIOINFORMATICS, 2018, 34 (15) : 2556 - 2565
  • [26] Exploiting orthology and de novo transcriptome assembly to refine target sequence information
    Soellner, Julia F.
    Leparc, German
    Zwick, Matthias
    Schoenberger, Tanja
    Hildebrandt, Tobias
    Nieselt, Kay
    Simon, Eric
    BMC MEDICAL GENOMICS, 2019, 12 (1)
  • [27] Exploiting orthology and de novo transcriptome assembly to refine target sequence information
    Julia F. Söllner
    Germán Leparc
    Matthias Zwick
    Tanja Schönberger
    Tobias Hildebrandt
    Kay Nieselt
    Eric Simon
    BMC Medical Genomics, 12
  • [28] De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data
    DiGuistini, Scott
    Liao, Nancy Y.
    Platt, Darren
    Robertson, Gordon
    Seidel, Michael
    Chan, Simon K.
    Docking, T. Roderick
    Birol, Inanc
    Holt, Robert A.
    Hirst, Martin
    Mardis, Elaine
    Marra, Marco A.
    Hamelin, Richard C.
    Bohlmann, Joerg
    Breuil, Colette
    Jones, Steven J. M.
    GENOME BIOLOGY, 2009, 10 (09):
  • [29] De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data
    Scott DiGuistini
    Nancy Y Liao
    Darren Platt
    Gordon Robertson
    Michael Seidel
    Simon K Chan
    T Roderick Docking
    Inanc Birol
    Robert A Holt
    Martin Hirst
    Elaine Mardis
    Marco A Marra
    Richard C Hamelin
    Jörg Bohlmann
    Colette Breuil
    Steven JM Jones
    Genome Biology, 10
  • [30] Linear time complexity de novo long read genome assembly with GoldRush
    Wong, Johnathan
    Coombe, Lauren
    Nikolic, Vladimir
    Zhang, Emily
    Nip, Ka Ming
    Sidhu, Puneet
    Warren, Rene L.
    Birol, Inanc
    NATURE COMMUNICATIONS, 2023, 14 (01)