In vitro, long-range sequence information for de novo genome assembly via transposase contiguity

被引:128
|
作者
Adey, Andrew [1 ]
Kitzman, Jacob O. [1 ]
Burton, Joshua N. [1 ]
Daza, Riza [1 ]
Kumar, Akash [1 ]
Christiansen, Lena [2 ]
Ronaghi, Mostafa [2 ]
Amini, Sasan [2 ]
Gunderson, Kevin L. [2 ]
Steemers, Frank J. [2 ]
Shendure, Jay [1 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98115 USA
[2] Illumina Inc, Adv Res Grp, San Diego, CA 92122 USA
基金
美国国家科学基金会;
关键词
LOW-INPUT; CONSTRUCTION; CHROMATIN; DATABASE; READS;
D O I
10.1101/gr.178319.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to >1 megabase. These pools are "subhaploid," in that the lengths of fragments contained in each pool sums to similar to 5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate "joins" are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight-to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
引用
收藏
页码:2041 / 2049
页数:9
相关论文
共 50 条
  • [31] Fast and accurate de novo genome assembly from long uncorrected reads
    Vaser, Robert
    Sovic, Ivan
    Nagarajan, Niranjan
    Sikic, Mile
    GENOME RESEARCH, 2017, 27 (05) : 737 - 746
  • [32] Long-read sequencing and de novo assembly of the cynomolgus macaque genome
    Bai, Bing
    Wang, Yi
    Zhu, Ran
    Zhang, Yaolei
    Wang, Hong
    Fan, Guangyi
    Liu, Xin
    Shi, Hong
    Niu, Yuyu
    Ji, Weizhi
    JOURNAL OF GENETICS AND GENOMICS, 2022, 49 (10) : 975 - 978
  • [33] Long-read sequencing and de novo assembly of the cynomolgus macaque genome
    Bing Bai
    Yi Wang
    Ran Zhu
    Yaolei Zhang
    Hong Wang
    Guangyi Fan
    Xin Liu
    Hong Shi
    Yuyu Niu
    Weizhi Ji
    JournalofGeneticsandGenomics, 2022, 49 (10) : 975 - 978
  • [34] Linear time complexity de novo long read genome assembly with GoldRush
    Johnathan Wong
    Lauren Coombe
    Vladimir Nikolić
    Emily Zhang
    Ka Ming Nip
    Puneet Sidhu
    René L. Warren
    Inanç Birol
    Nature Communications, 14 (1)
  • [35] Sequence variability and long-range dependence in DNA: An information theoretic perspective
    Krishnamachari, K
    Krishnamachari, A
    NEURAL INFORMATION PROCESSING, 2004, 3316 : 1354 - 1361
  • [36] Complete Genome Sequence of Amycolatopsis mediterranei S699 Based on De Novo Assembly via a Combinatorial Sequencing Strategy
    Tang, Biao
    Zhao, Wei
    Zheng, Huajun
    Zhuo, Ying
    Zhang, Lixin
    Zhao, Guo-Ping
    JOURNAL OF BACTERIOLOGY, 2012, 194 (20) : 5699 - 5700
  • [37] De Novo Assembly of a Bell Pepper Endornavirus Genome Sequence Using RNA Sequencing Data
    Jo, Yeonhwa
    Choi, Hoseng
    Cho, Won Kyong
    GENOME ANNOUNCEMENTS, 2015, 3 (02)
  • [38] The sequence and de novo assembly of the giant panda genome (vol 463, pg 311, 2010)
    Li, Ruiqiang
    Fan, Wei
    Tian, Geng
    Zhu, Hongmei
    He, Lin
    Cai, Jing
    Huang, Quanfei
    Cai, Qingle
    Li, Bo
    Bai, Yinqi
    Zhang, Zhihe
    Zhang, Yaping
    Wang, Wen
    Li, Jun
    Wei, Fuwen
    Li, Heng
    Jian, Min
    Li, Jianwen
    Zhang, Zhaolei
    Nielsen, Rasmus
    Li, Dawei
    Gu, Wanjun
    Yang, Zhentao
    Xuan, Zhaoling
    Ryder, Oliver A.
    Leung, Frederick Chi-Ching
    Zhou, Yan
    Cao, Jianjun
    Sun, Xiao
    Fu, Yonggui
    Fang, Xiaodong
    Guo, Xiaosen
    Wang, Bo
    Hou, Rong
    Shen, Fujun
    Mu, Bo
    Ni, Peixiang
    Lin, Runmao
    Qian, Wubin
    Wang, Guodong
    Yu, Chang
    Nie, Wenhui
    Wang, Jinhuan
    Wu, Zhigang
    Liang, Huiqing
    Min, Jiumeng
    Wu, Qi
    Cheng, Shifeng
    Ruan, Jue
    Wang, Mingwei
    NATURE, 2010, 463 (7284) : 1106 - 1106
  • [39] De Novo Assembly and Annotation of the Complete Genome Sequence of Myxococcus xanthus DZ2
    Aramayo, Rodolfo
    Nan, Beiyan
    MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2022, 11 (05):
  • [40] Long-read de novo genome assembly of Gulf toadfish (Opsanus beta)
    Kron, Nicholas S.
    Young, Benjamin D.
    Drown, Melissa K.
    Mcdonald, M. Danielle
    BMC GENOMICS, 2024, 25 (01):